2 datasets found

h
open_subtitles_multilingual
huggingface.co
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Wurgaft (2024). open_subtitles_multilingual [Dataset]. https://huggingface.co/datasets/DanielWurgaft/open_subtitles_multilingual
Explore at:
Dataset updated
Sep 12, 2024
Authors
Daniel Wurgaft
Description
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/. IMPORTANT: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data! This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking. 62 languages, 1,782 bitexts total number of files: 3,735,070 total number of tokens: 22.10G total number of sentence fragments: 3.35G
h
open_subtitles
huggingface.co
modeldatabase.com
+1more
Updated Dec 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Language Technology Research Group at the University of Helsinki (2020). open_subtitles [Dataset]. https://huggingface.co/datasets/Helsinki-NLP/open_subtitles
Explore at:
Dataset updated
Dec 10, 2020
Dataset authored and provided by
Language Technology Research Group at the University of Helsinki
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.

IMPORTANT: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!

This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.

62 languages, 1,782 bitexts total number of files: 3,735,070 total number of tokens: 22.10G total number of sentence fragments: 3.35G
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Daniel Wurgaft (2024). open_subtitles_multilingual [Dataset]. https://huggingface.co/datasets/DanielWurgaft/open_subtitles_multilingual

open_subtitles_multilingual

DanielWurgaft/open_subtitles_multilingual

Explore at:

Dataset updated

Sep 12, 2024

Authors

Daniel Wurgaft

Description

This is a new collection of translated movie subtitles from http://www.opensubtitles.org/. IMPORTANT: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data! This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking. 62 languages, 1,782 bitexts total number of files: 3,735,070 total number of tokens: 22.10G total number of sentence fragments: 3.35G

Clear search

Close search

Google apps

Main menu