100+ datasets found
  1. h

    open_subtitles

    • huggingface.co
    Updated May 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Helsinki-NLP Research Group (2024). open_subtitles [Dataset]. https://huggingface.co/datasets/Helsinki-NLP/open_subtitles
    Explore at:
    Dataset updated
    May 13, 2024
    Dataset authored and provided by
    Helsinki-NLP Research Group
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.

    IMPORTANT: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!

    This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.

    62 languages, 1,782 bitexts total number of files: 3,735,070 total number of tokens: 22.10G total number of sentence fragments: 3.35G

  2. English Movie Subtitle Dataset

    • kaggle.com
    zip
    Updated May 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shadim Sadiq (2023). English Movie Subtitle Dataset [Dataset]. https://www.kaggle.com/datasets/shadimsadiq/english-movie-subtitle-dataset
    Explore at:
    zip(43167342 bytes)Available download formats
    Dataset updated
    May 9, 2023
    Authors
    Shadim Sadiq
    Description

    Subtitles are a text representation of the spoken dialogue and other relevant audio information in a video, such as background sounds or music. This dataset is likely to be useful for natural language processing (NLP) tasks, such as language modeling, sentiment analysis, and named entity recognition. It could also be used for machine learning tasks, such as text classification or clustering. With this dataset, researchers and developers can analyze the language used in movies, study how language evolves over time, and train models to perform various NLP tasks on movie subtitles.

  3. Open Subtitles Multilingual Translation

    • kaggle.com
    zip
    Updated Nov 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Open Subtitles Multilingual Translation [Dataset]. https://www.kaggle.com/datasets/thedevastator/open-subtitles-multilingual-translation
    Explore at:
    zip(403304423 bytes)Available download formats
    Dataset updated
    Nov 26, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Open Subtitles Multilingual Translation

    Train Sequential Neural Networks in Nine Languages

    By Huggingface Hub [source]

    About this dataset

    This dataset provides an invaluable opportunity to train a neural network model to effectively and accurately translate text between an array of nine different languages, including Finnish, Hindi, Basque, Esperanto, French, Armenian, Bengali, Icelandic and Russian. Each language CSV file includes three columns: an ID column; a meta column which provides information about the source of the sentence; and finally a 'translation' column that contains the translated sentence. The aim is to build a dataset suitable for training models capable of mastering multilingual translation tasks in order to bridge gaps between languages. Train your model with this unique dataset today!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is a great resource for anyone looking to build a translation model using neural networks. Here is a guide on how to use it:

    • Download the appropriate .csv files for the languages you need from the Kaggle dataset.
    • The data comes in an easily accessible CSV file, with ID, meta and translation columns included in each row of data. The ID column consists of integer values that can be used to identify each row and create unique feature ignition labels when training your model, while the meta column contains information about where each sentence originated from, allowing you to quickly filter out any sentences with suspect origins if needed. The translation column should include both English translations as well as their foreign language equivalents per sentence (depending on which language you are working with).
    • To train your neural network model it's important that you have enough training data available and try different language-pairs related sub-set datasets if available before assembling your final full dataset for training later on down the road once all inputs are ready (if needed). This Kaggle set should provide sufficient sample sizes per individual language pair so proceed according appropriate after downloading whatever subsets needed from this main database here first.

    • Now it’s time to construct our input features vector sets for our neural network configuration/setup by gathering all relevant variables in separate lists/arrays depending on preferred coding method used later when setting up our NN architecture layer setups appropriately based off all gathered items (elements) contained inside their respective list(s)/array(s) generated previously by implementing these steps mentioned above accordingly prior first before doing anything requiring input variable providing relevant core information found initially inside this Primary Open Subtitle Database explored so far properly earlier until now prior to continuing ahead next further below progressively further soon onward next momentarily right straight away very shortly right afterwards verily literally afterwards manually immediately properly eventually orderly personally autonomously biologically etc fortuitously contemporaneously instantaneously automatically justly necessarily lastly rightly confidently quixotically thankfully digitally informatively thereby correspondingly conjecturally constructively alike remarkably consistently instinctually markedly freely liberally perhaps anecdotally feasibly undeniably dynamically promptly easily holistically fairly evidently continually spontaneously intrinsically adaptively pictorially expressively intuitively hopefully methodically rationally prophetically perspicuously naturally savagely progressively peculiarly responsively whimsically illustratively skilfully tenaciously swiftly mysteriously productively continuously electromagnetically agitatedly constantly accurately ingeniously busily purposefully eagerly curiously exuberantly aud

    Research Ideas

    • Creating a neural network to automatically translate texts from any of the 9 languages in this dataset into any other language.
    • Developing an AI-powered chatbot that can reply in multiple languages that the users prefer.
    • Building an automatic translation system with real-time video conversation capabilities for use by professionals such as interpreters and international translators

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    **License: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativecommon...

  4. h

    subscene

    • huggingface.co
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    REFINE ai (2025). subscene [Dataset]. https://huggingface.co/datasets/refine-ai/subscene
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset authored and provided by
    REFINE ai
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Subscene is a vast collection of multilingual subtitles, encompassing 65 different languages and consisting of more than 30 billion tokens with a total size of 410.70 GB. This dataset includes subtitles for movies, series, and animations gathered from the Subscene dump. It provides a rich resource for studying language variations and building multilingual NLP models. We have carefully applied a fastText classifier to remove any non-language content from incorrect subsets. Additionally, we performed basic cleaning and filtration. However, there is still room for further cleaning and refinement.

  5. h

    yyets-subtitles

    • huggingface.co
    Updated Dec 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chenrm (2025). yyets-subtitles [Dataset]. https://huggingface.co/datasets/chenrm/yyets-subtitles
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 4, 2025
    Authors
    chenrm
    Description

    chenrm/yyets-subtitles dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. R

    Subtitles Dataset

    • universe.roboflow.com
    zip
    Updated Oct 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    subtitles (2022). Subtitles Dataset [Dataset]. https://universe.roboflow.com/subtitles-jtdc8/subtitles-xmseb/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 9, 2022
    Dataset authored and provided by
    subtitles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Letters Bounding Boxes
    Description

    Subtitles

    ## Overview
    
    Subtitles is a dataset for object detection tasks - it contains Letters annotations for 500 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  7. YouTube Video Statistics and Subtitles Dataset

    • kaggle.com
    zip
    Updated Jul 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamza (2023). YouTube Video Statistics and Subtitles Dataset [Dataset]. https://www.kaggle.com/datasets/hamza3692/youtube-video-statistics-and-subtitles-dataset
    Explore at:
    zip(11716047 bytes)Available download formats
    Dataset updated
    Jul 3, 2023
    Authors
    Hamza
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    YouTube
    Description

    The YouTube Insights dataset offers valuable data for researchers, data scientists, and YouTube enthusiasts to explore video performance and engagement. This dataset focuses on key elements such as video titles, view counts, analytics, and subtitles.

    With a wide range of YouTube videos, spanning various genres and upload dates, this dataset provides insights into video popularity and audience engagement. Researchers can analyze video titles to understand effective strategies for capturing viewer attention. View counts offer quantitative measures of video popularity, while analytics data provides metrics like likes, dislikes, comments, and shares.

    The inclusion of subtitles enhances the dataset, enabling language pattern analysis, sentiment analysis, and keyword extraction. Researchers can uncover correlations between subtitles and video content to gain a deeper understanding of audience preferences and behavior.

    The YouTube Insights dataset empowers users to discover valuable insights into YouTube's ecosystem, optimizing content creation and engagement strategies. It serves as a foundation for research, analysis, and innovation in the realm of online video platforms.

  8. h

    YouTube-Subtitles

    • huggingface.co
    Updated Apr 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language Technologies, Bangor University (2025). YouTube-Subtitles [Dataset]. https://huggingface.co/datasets/techiaith/YouTube-Subtitles
    Explore at:
    Dataset updated
    Apr 16, 2025
    Dataset authored and provided by
    Language Technologies, Bangor University
    License

    Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    techiaith/YouTube-Subtitles dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. d

    ParTree - Parallel Treebanks: A multilingual corpus of movie subtitles.

    • doi.org
    • swissubase.ch
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). ParTree - Parallel Treebanks: A multilingual corpus of movie subtitles. [Dataset]. http://doi.org/10.48656/5mz4-x435
    Explore at:
    Dataset updated
    Mar 21, 2023
    Description

    A multilingual corpus of movie subtitles aligned on the sentence-level. Contains data on more than 50 languages with a focus on the Indo-European language family. Morphosyntactic annotation (part-of-speech, features, dependencies) in Universal Dependency-style is available for 47 languages.

  10. French Conversations (from movie subtitles)

    • kaggle.com
    zip
    Updated Aug 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dali Selmi (2023). French Conversations (from movie subtitles) [Dataset]. https://www.kaggle.com/datasets/daliselmi/french-conversational-dataset
    Explore at:
    zip(2880370702 bytes)Available download formats
    Dataset updated
    Aug 3, 2023
    Authors
    Dali Selmi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    French
    Description

    French Movie Subtitle Conversations Dataset

    Description

    Dive into the world of French dialogue with the French Movie Subtitle Conversations dataset – a comprehensive collection of over 127,000 movie subtitle conversations. This dataset offers a deep exploration of authentic and diverse conversational contexts spanning various genres, eras, and scenarios. It is thoughtfully organized into three distinct sets: training, testing, and validation.

    Content Overview

    Each conversation in this dataset is structured as a JSON object, featuring three key attributes:

    1. Context: Get a holistic view of the conversation's flow with the preceding 9 lines of dialogue. This context provides invaluable insights into the conversation's dynamics and contextual cues.
    2. Knowledge: Immerse yourself in a wide range of thematic knowledge. This dataset covers an array of topics, ensuring that your models receive exposure to diverse information sources for generating well-informed responses.
    3. Response: Explore how characters react and respond across various scenarios. From casual conversations to intense emotional exchanges, this dataset encapsulates the authenticity of genuine human interaction.

    Data Sample

    Here's a snippet from the dataset to give you an idea of its structure:

    [
     {
      "context": [
       "Tu as attendu longtemps?",
       "Oui en effet.",
       "Je pense que c' est grossier pour un premier rencard.",
       // ... (6 more lines of context)
      ],
      "knowledge": "",
      "response": "On n' avait pas dit 9h?"
     },
     // ... (more data samples)
    ]
    

    Use Cases

    The French Movie Subtitle Conversations dataset serves as a valuable resource for several applications:

    • Conversational AI: Train advanced chatbots and dialogue systems in French that can engage users in fluid, contextually aware conversations.
    • Language Modeling: Enhance your language models by leveraging diverse dialogue patterns, colloquialisms, and contextual dependencies present in real-world conversations.
    • Sentiment Analysis: Investigate the emotional tones of conversations across different movie genres and periods, contributing to a better understanding of sentiment variation.

    Why This Dataset

    • Size and Diversity: With a vast collection of over 127,000 conversations spanning diverse genres and tones, this dataset offers an unparalleled breadth and depth in French dialogue data.
    • Contextual Richness: The inclusion of context empowers researchers and practitioners to explore the dynamics of conversation flow, leading to more accurate and contextually relevant responses.
    • Real-world Relevance: Originating from movie subtitles, this dataset mirrors real-world interactions, making it a valuable asset for training models that understand and generate human-like dialogue.

    Acknowledgments

    We extend our gratitude to the movie subtitle community for their contributions, which have enabled the creation of this diverse and comprehensive French dialogue dataset.

    Unlock the potential of authentic French conversations today with the French Movie Subtitle Conversations dataset. Engage in state-of-the-art research, enhance language models, and create applications that resonate with the nuances of real dialogue.

  11. m

    IndicDialogue Dataset

    • data.mendeley.com
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noor Mairukh Khan Arnob (2024). IndicDialogue Dataset [Dataset]. http://doi.org/10.17632/wcb4bxbyxx.2
    Explore at:
    Dataset updated
    Jun 11, 2024
    Authors
    Noor Mairukh Khan Arnob
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    The IndicDialogue dataset contains raw subtitle SRT files and dialogues extracted from them. The subtitles are in 10 indic languages, namely Hindi, Bengali, Marathi, Telugu, Tamil, Urdu, Odia, Sindhi, Nepali and Assamese. This dataset provides a corpus for performing various NLP tasks in low-resource languages using SLMs(Small Language Models) and LLMs(Large Language Models).

  12. Russian - Belarusian subtitles dataset for MT

    • kaggle.com
    zip
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksandr Kliukin (2025). Russian - Belarusian subtitles dataset for MT [Dataset]. https://www.kaggle.com/datasets/aleksanderrb/russian-belarusian-subtitles-dataset-for-mt
    Explore at:
    zip(4651007 bytes)Available download formats
    Dataset updated
    May 15, 2025
    Authors
    Aleksandr Kliukin
    Description

    Dataset

    This dataset was created by Aleksandr Kliukin

    Contents

  13. D

    AI Subtitle Generation Market Research Report 2034

    • dataintelo.com
    csv, pdf, pptx
    Updated Mar 21, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2026). AI Subtitle Generation Market Research Report 2034 [Dataset]. https://dataintelo.com/report/ai-subtitle-generation-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Mar 21, 2026
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2025 - 2034
    Area covered
    Global
    Description




    Key Takeaways: AI Subtitle Generation Market



    • Global AI subtitle generation market valued at $3.8 billion in 2025

    • Expected to reach $18.6 billion by 2034 at a CAGR of 19.3%

    • Software component held the largest share at 62.4% in 2025

    • North America dominated with 38.7% revenue share in 2025

    • Key drivers: surge in video content consumption, accessibility compliance regulations, and growth in multilingual content demand

    • Rev.com led the competitive landscape with the broadest enterprise client base

    • Report spans 2025 to 2034 with 294+ pages of comprehensive analysis





    AI Subtitle Generation Market Outlook 2025-2034


    The global AI subtitle generation market was valued at $3.8 billion in 2025 and is projected to reach $18.6 billion by 2034, expanding at a compound annual growth rate (CAGR) of 19.3% during the forecast period from 2026 to 2034, driven by an unprecedented surge in digital video content, tightening accessibility regulations across major economies, and the rapid maturation of deep learning-based speech recognition technologies. The proliferation of over-the-top (OTT) streaming platforms, corporate video communications, and e-learning ecosystems has catalyzed demand for fast, accurate, and scalable subtitle generation solutions across virtually every industry vertical. Enterprises and content creators alike are increasingly abandoning manual captioning workflows in favor of AI-powered platforms that can deliver near real-time transcription at a fraction of the traditional cost. Advances in transformer-based language models, including architectures derived from OpenAI's Whisper and Google's Universal Speech Model (USM), have dramatically reduced word error rates (WER) to below 5% for major global languages, making AI-generated subtitles commercially viable for broadcast-grade applications. The integration of large language models (LLMs) with automatic speech recognition (ASR) engines has further enabled context-aware subtitle formatting, speaker diarization, and on-the-fly translation into more than 100 languages. Regulatory tailwinds such as the European Accessibility Act (EAA), scheduled for full enforcement in June 2025, and the U.S. Federal Communications Commission (FCC) mandates on video captioning have compelled media companies to invest heavily in automated captioning infrastructure. Simultaneously, the explosion of short-form video content on platforms such as TikTok, Instagram Reels, and YouTube Shorts has created a massive long-tail demand among individual content creators for quick, affordable subtitle solutions. The market is also benefiting from the hybridization of AI models with human review workflows, where AI handles the heavy lifting at scale while human editors perform quality assurance, creating a services layer that is growing in parallel with pure software revenues.






    Market Size (2025)

    $3.8B


    Forecast (2034)

    $18.6B

    <div style="flex:1;min-width:180px;background:linear-gradient(135deg,#059669,#34d399);color:#fff;padding:20px;bord

  14. YouTube Video Subtitles

    • kaggle.com
    zip
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Herman (2025). YouTube Video Subtitles [Dataset]. https://www.kaggle.com/datasets/jetakow/youtube-videos-subtitles
    Explore at:
    zip(42191918 bytes)Available download formats
    Dataset updated
    Feb 5, 2025
    Authors
    Daniel Herman
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    YouTube
    Description

    Over 12k scraped YouTube EN subtitles for videos on GitHub topics.

    How? Based on the topics https://github.com/topics I searched YouTube with the phrase "What is {topic}?" and downloaded up to 100 video subtitles for a given topic. The extracted text can be found in the dataset together with the topic name, video title and video URL.

    Why? I wan to know if we can rate videos based on their information value, especially when we use YouTube as an information source.

    You can find the source code here: https://github.com/detrin/text-info-value

  15. h

    french-conversations-from-movie-subtitles

    • huggingface.co
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    daliselmi (2023). french-conversations-from-movie-subtitles [Dataset]. https://huggingface.co/datasets/daliselmi/french-conversations-from-movie-subtitles
    Explore at:
    Dataset updated
    Aug 4, 2023
    Authors
    daliselmi
    Area covered
    French
    Description

    daliselmi/french-conversations-from-movie-subtitles dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. S

    Subtitling and Captioning Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Feb 5, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2026). Subtitling and Captioning Report [Dataset]. https://www.datainsightsmarket.com/reports/subtitling-and-captioning-1393307
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Feb 5, 2026
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2026 - 2034
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming subtitling and captioning market! This in-depth analysis reveals key trends, growth drivers, leading companies (SDI Media, IYUNO, Deluxe Media, ZOO Digital), and regional market shares from 2019-2033. Learn about the impact of AI and increasing demand for multilingual content.

  17. Movie Subtitle Dataset

    • kaggle.com
    zip
    Updated Aug 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adiamaan (2021). Movie Subtitle Dataset [Dataset]. https://www.kaggle.com/adiamaan/movie-subtitle-dataset
    Explore at:
    zip(254871718 bytes)Available download formats
    Dataset updated
    Aug 8, 2021
    Authors
    Adiamaan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    πŸ’‘ Motive

    I was thinking about movie sentiments and wanted to see if there is any strong pattern behind how sentiment fluctuates across the movie to how that movie is received or performed.

    🍎 Lowest hanging fruit

    To track movie sentiments across the run time, the easy way is to get the movie subtitles and identify the sentiment for each text in the subtitle. The advantage of this approach is that movie subtitles are easy to get, parse, and process and NLP frameworks can easily help with the task. This approach is scalable since irrespective of language, english subtitles are available for almost all movies albeit translation errors.

  18. h

    Subtitles

    • huggingface.co
    Updated Apr 4, 2009
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peanut Jar Mixers Development (2009). Subtitles [Dataset]. https://huggingface.co/datasets/PJMixers-Dev/Subtitles
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2009
    Dataset authored and provided by
    Peanut Jar Mixers Development
    Description

    PJMixers-Dev/Subtitles dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. R

    Real-time Subtitles Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Jan 24, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2026). Real-time Subtitles Report [Dataset]. https://www.marketresearchforecast.com/reports/real-time-subtitles-51386
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Jan 24, 2026
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2026 - 2034
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming real-time subtitles market! Explore its growth drivers, key trends, and leading companies shaping this dynamic industry. Learn about market size, segmentation, and regional variations in this comprehensive analysis of the 2025-2033 forecast.

  20. C

    Captioning and Subtitling Service Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Feb 7, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2026). Captioning and Subtitling Service Report [Dataset]. https://www.datainsightsmarket.com/reports/captioning-and-subtitling-service-1977895
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Feb 7, 2026
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2026 - 2034
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Explore the booming Captioning and Subtitling Service market, valued at USD 2.5 billion in 2025 and growing at a 10% CAGR. Discover key drivers like broadcast, streaming, and education, and understand regional market shares and future trends.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Helsinki-NLP Research Group (2024). open_subtitles [Dataset]. https://huggingface.co/datasets/Helsinki-NLP/open_subtitles

open_subtitles

OpenSubtitles

Helsinki-NLP/open_subtitles

Explore at:
8 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
May 13, 2024
Dataset authored and provided by
Helsinki-NLP Research Group
License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.

IMPORTANT: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!

This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.

62 languages, 1,782 bitexts total number of files: 3,735,070 total number of tokens: 22.10G total number of sentence fragments: 3.35G

Search
Clear search
Close search
Google apps
Main menu