Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This repository contains a collection of Russian literature in txt format (all in UTF-8 encoding). In addition, for each author there is a csv file containing information about the year of writing of each work.
This dataset was created for a project to determine the authorship of a piece of text, but I'm sure that you can use this dataset for anything 😉.
The main feature that allows this dataset to be used for any purpose is that the data is not processed at all. The text has not been pre-processed in any way, the designations of authors, chapters and references to the translation of foreign inserts have not been removed.
Thanks Ilibrary, LitLib, Wikisource and all-all-all.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dashboard - Russian literature and thought - Russian literature and thought is a series of 11 books by 10 authors between 1995 and 2011
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Matvei Danilov
Released under Apache 2.0
Facebook
TwitterExplore Institute of Russian Literature Dataverse open data for St Petersburg, Russia, published by Institute of Russian Literature Dataverse. Browse 45 public datasets, resources, and metadata in Nixa.
Facebook
TwitterJoined corpus of russian books. Can be good for text generation networks like gpt I do not own any of these texts and they should be used for educational purposes only, i guess
Facebook
TwitterThe Uppsala Corpus (Upsal'skij korpus russkix tekstov) consists of some 600 Russian texts with a total of one million running words (word tokens), equally divided between informative and literary prose. The informative texts are from between 1985 and 1989, while the literary texts, whose vocabulary does not date as quickly, cover a longer period, 1960-88. The corpus does not include poetry or drama.
Within the given frameword, considerable effort has been made to ensure as representative and varied a corpus as possible. The informative texts are drawn from 25 different subject areas: economics, foreign affairs / foreign policy, ideology / domestic policy, party matters, Soviet society, social issues, defence, education, law, history, culture, linguistics, medicine / health care, psychology, environment / ecology, agriculture, engineering, information technology, space research, energy, biology, geology / geography, physics, chemistry and sport. Certain areas which were felt to be more important are represented by a larger volume of texts.
The literary half of the corpus comprises work by the following 40 authors: Abramov, Ajtmatov, Astaf'ev, Baklanov, Bek, Belov, Bitov, Bondarev, Dubov, Ganin, Gladyshev, Granin, Grekova, Goncharov, Iskander, Kaverin, Kazakov, Kochnev, Kozhevnikova, Nagibin, Lixanov, Lidin, Paustovskij, Pogodin, Pristavkin, Troepol'skij, Rasputin, Shcherbakova, Simonov, Solouxin, Shmelev, Tendrjakov, Tokareva, Tolstaja, Trifonov, Vasil'ev, Vorobl'ev, Zalygin and Zorin. Here, too, there is unequal representation, with a larger amount of writing by the better-known authors.
For further details about the corpus, see Lönngren, Lennart (ed.), 1993. Chastotnyj slovar' sovremennogo russkogo jazyka. (A Frequency Dictionary of Modern Russian. With a Summary in English.) Acta Universitatis Upsaliensis, Studia Slavica Upsaliensia 32. 188 pp. Uppsala. ISBN 91-554-3134-8.
Purpose:
The aim is to provide a corpus of Russian prose texts.
Facebook
TwitterThese data were used to produce the network graphs that accompany the chapter 'Mapping the Networks of Crime and Punishment," published in 'Approaches to Teaching Dostoevsky's Crime and Punishment' (ed. M. Katz and A. Burry, MLA Approaches to Teaching World Literature series, forthcoming in 2021). Graph: "Offstage" connections only.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Winograd schema challenge composes tasks with syntactic ambiguity, which can be resolved with logic and reasoning (Levesque et al., 2012).
The texts for the Winograd schema problem are obtained using a semi-automatic pipeline. First, lists of 11 typical grammatical structures with syntactic homonymy (mainly case) are compiled. For example, two noun phrases with a complex subordinate: 'A trinket from Pompeii that has survived the centuries'. Requests corresponding to these constructions are submitted in search of the Russian National Corpus, or rather its sub-corpus with removed homonymy. In the resulting 2+k examples, homonymy is removed automatically with manual validation afterward. Each original sentence is split into multiple examples in the binary classification format, indicating whether the homonymy is resolved correctly or not.
Facebook
TwitterThe data material consists of a detailed description of a review corpus used in order to analyze the reception of Russian literature in Sweden. The investigations that have and will be conducted based on the review corpus analyze for example translation visibility, translation criticism and the image of Russian literature in the Swedish literary system. The review corpus consists of 430 reviews of post-Soviet Russian novels published in Swedish translation between 1992 and 2020. The reviews are protected by copyright and may not be made available. Therefore, the data instead contains a complete specification of the review database, information regarding how the reviews have been classified, and finally, information about thematic coding related to specific investigations (articles).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data material consists of a detailed description of a review corpus used in order to analyze the reception of Russian literature in Sweden. The investigations that have and will be conducted based on the review corpus analyze for example translation visibility, translation criticism and the image of Russian literature in the Swedish literary system. The review corpus consists of 430 reviews of post-Soviet Russian novels published in Swedish translation between 1992 and 2020. The reviews are protected by copyright and may not be made available. Therefore, the data instead contains a complete specification of the review database, information regarding how the reviews have been classified, and finally, information about thematic coding related to specific investigations (articles).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article examines the phenomenon of Russian literary postmodernism: its chronological scope, key differences from its Western counterpart (its traumatic nature, its reaction to the collapse of the Soviet utopia), and its main aesthetic principles (intertextuality, irony, and demythologization). The author identifies the main trends (Moscow conceptualism, Leningrad metarealism, “other prose”) and key figures (Venedikt Erofeev, Vladimir Sorokin, Viktor Pelevin, Dmitry Prigov, Tatyana Tolstaya). Special attention is given to seminal texts (“Moscow—Petushki,” “Pushkin’s House,” “Chapaev and the Void”) and an analysis of the crisis of postmodernism in the 2000s with the transition to new literary strategies (metamodernism, new sincerity). The material is structured and suitable both for an introduction to the topic and for consolidating knowledge.
Facebook
Twitterhttps://data.gov.tw/licensehttps://data.gov.tw/license
National Science and Technology Committee Literature II Discipline Project Subsidy List.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
• articles.txt – Texts of popular articles on various topics published on dzen.ru (~20 million characters)
• books-A.txt – Fragments of various works of world-class Russian and foreign literature (~20 million characters)
• books-B.txt – Fragments of various works of literature, both world-famous and little-known (~20 million characters)
• fanfiction.txt – Texts of popular fanfiction on various topics published on ficbook.net (~20 million characters)
• jokes.txt – Texts of various jokes and puns (~6.7 million characters)
• poems.txt – Texts of various poems by world-famous authors (~40 million characters)
Facebook
TwitterThese data were used to produce the network graphs that accompany the chapter 'Mapping the Networks of Crime and Punishment," published in 'Approaches to Teaching Dostoevsky's Crime and Punishment' (ed. M. Katz and A. Burry, MLA Approaches to Teaching World Literature series, forthcoming in 2021). Graph: Complete character network.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The article examines the evolution of the hero's image in 18th-century Russian literature in the context of changing artistic trends and the transformation of aesthetic paradigms of the era. The movement from the normative-allegorical model of personality in the system of classicism to the socio-educational and further to the emotional-psychological concept of a person in sentimentalism is analyzed. Based on the works of Alexander Sumarokov, Mikhail Lomonosov, Denis Fonvizin, Gavriil Derzhavin, Alexander Radishchev and Nikolai Karamzin, structural changes in the characterological organization of the hero, the principles of motivation of his actions and ways of artistic representation of the inner world are revealed.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
And Quiet Flows the Don or Quietly Flows the Don (Russian: Тихий Дон, literally "The Quiet Don") is an epic novel in four volumes by Russian writer Mikhail Alexandrovich Sholokhov. The first three volumes were written from 1925 to 1932 and published in the Soviet magazine Oktyabr in 1928–1932, and the fourth volume was finished in 1940. The English translation of the first three volumes appeared under this title in 1934.
The novel is considered one of the most significant works of world and Russian literature in the 20th century. It depicts the lives and struggles of Don Cossacks during the First World War, the Russian Revolution, and Russian Civil War. In 1965, Sholokhov was awarded the Nobel Prize for Literature for this novel.
source: https://en.wikipedia.org/wiki/And_Quiet_Flows_the_Don
Book is written in Russian with a lot of dialecticisms specific to the basin of the lower and middle Don
Data provided as text file
Facebook
Twitterhttps://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Facebook
TwitterThe "World Literature Summaries on Russian" dataset is a comprehensive collection of concise summaries of literary works from around the globe, presented in the Russian language. This dataset offers a valuable resource for researchers, students, and literature enthusiasts interested in exploring and analyzing a wide range of literary masterpieces, including novels, plays, poems, and short stories. With summaries spanning various genres, time periods, and cultural backgrounds, this dataset provides a rich source of information, enabling users to gain insights into the plots, themes, and characters of renowned literary works. Whether you're conducting literary analysis, studying world literature, or simply seeking a curated selection of summaries to enhance your reading experience, this dataset is a valuable tool for unlocking the essence of global literature through the lens of the Russian language.
Facebook
TwitterStihi.ru dataset
Description
Summary: A subset if Taiga, uploaded here for convenience. Additional cleaning was performed. Script: create_stihi.py Point of Contact: Ilya Gusev Languages: Russian.
Usage
Prerequisites: pip install datasets zstandard jsonlines pysimdjson
Dataset iteration: from datasets import load_dataset dataset = load_dataset('IlyaGusev/stihi_ru', split="train", streaming=True) for example in dataset: print(example["text"])… See the full description on the dataset page: https://huggingface.co/datasets/IlyaGusev/stihi_ru.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data material consists of a detailed description of a review corpus used in order to analyze the reception of Russian literature in Sweden. The reviews are protected by copyright and may not be made available. Therefore, the data instead contains a complete specification of the review database, information regarding how the reviews have been classified, and finally, information about the authors, translators, critics and media sources related to the material.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This repository contains a collection of Russian literature in txt format (all in UTF-8 encoding). In addition, for each author there is a csv file containing information about the year of writing of each work.
This dataset was created for a project to determine the authorship of a piece of text, but I'm sure that you can use this dataset for anything 😉.
The main feature that allows this dataset to be used for any purpose is that the data is not processed at all. The text has not been pre-processed in any way, the designations of authors, chapters and references to the translation of foreign inserts have not been removed.
Thanks Ilibrary, LitLib, Wikisource and all-all-all.