This is a new collection of translated movie subtitles from http://www.opensubtitles.org/. IMPORTANT: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data! This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking. 62 languages, 1,782 bitexts total number of files: 3,735,070 total number of tokens: 22.10G total number of sentence fragments: 3.35G
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
IMPORTANT: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts total number of files: 3,735,070 total number of tokens: 22.10G total number of sentence fragments: 3.35G
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/. IMPORTANT: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data! This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking. 62 languages, 1,782 bitexts total number of files: 3,735,070 total number of tokens: 22.10G total number of sentence fragments: 3.35G