2 datasets found
  1. h

    open_subtitles_multilingual

    • huggingface.co
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Wurgaft (2024). open_subtitles_multilingual [Dataset]. https://huggingface.co/datasets/DanielWurgaft/open_subtitles_multilingual
    Explore at:
    Dataset updated
    Sep 12, 2024
    Authors
    Daniel Wurgaft
    Description

    This is a new collection of translated movie subtitles from http://www.opensubtitles.org/. IMPORTANT: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data! This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking. 62 languages, 1,782 bitexts total number of files: 3,735,070 total number of tokens: 22.10G total number of sentence fragments: 3.35G

  2. h

    open_subtitles

    • huggingface.co
    • modeldatabase.com
    • +1more
    Updated Dec 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language Technology Research Group at the University of Helsinki (2020). open_subtitles [Dataset]. https://huggingface.co/datasets/Helsinki-NLP/open_subtitles
    Explore at:
    Dataset updated
    Dec 10, 2020
    Dataset authored and provided by
    Language Technology Research Group at the University of Helsinki
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.

    IMPORTANT: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!

    This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.

    62 languages, 1,782 bitexts total number of files: 3,735,070 total number of tokens: 22.10G total number of sentence fragments: 3.35G

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Daniel Wurgaft (2024). open_subtitles_multilingual [Dataset]. https://huggingface.co/datasets/DanielWurgaft/open_subtitles_multilingual

open_subtitles_multilingual

DanielWurgaft/open_subtitles_multilingual

Explore at:
Dataset updated
Sep 12, 2024
Authors
Daniel Wurgaft
Description

This is a new collection of translated movie subtitles from http://www.opensubtitles.org/. IMPORTANT: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data! This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking. 62 languages, 1,782 bitexts total number of files: 3,735,070 total number of tokens: 22.10G total number of sentence fragments: 3.35G

Search
Clear search
Close search
Google apps
Main menu