Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual diversity score from 2013 to 2023 for Russia Elementary School vs. Ohio and Russia Local School District
Facebook
TwitterThroughout the history of the Soviet Union, Russians were consistently the largest ethnic group in the USSR. Of a total population of 262 million people in 1979, the share who were Russian was over 137 million, which is equal to roughly 52 percent. In 1989, the total population of the Soviet Union was almost 286 million, with the ethnic Russian population at 145 million, or 51 percent. Following the dissolution of the Soviet Union in 1991, the Tatars were the only of the ten largest ethnic groups not to be given their own independent country, with Tatarstan instead becoming one of Russia's federal republics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains the authors’ materials on macrophyte diversity (macroscopic plants regardless of their taxonomic position) in rivers and streams of East European Russia and Western Siberia. These data were collected on 247 rivers and 32 streams in 13 administrative regions of the Russian Federation. The main portion of the data was obtained in water objects of the Vologda Region (5201 occurrences). In addition, there are data from Arkhangelsk Region (347 occurrences), Khanty-Mansi Autonomous Okrug (159), Yaroslavl Region (132), Novgorod Region (97), Kostroma Region (41), Republic of Karelia (31), Sverdlovsk Region (29), Komi Republic (28), Orenburg Region (26), Chelyabinsk Region (22), Voronezh Region (22), and Tyumen Region (18). Overall, the dataset contains materials on Plantae (6094) and Chromista (59) diversity. A total of 6153 occurrences (280 lower-rank taxa and 12 taxa identified to the genus level) are included in the dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cystic fibrosis (CF) is a common monogenic disease caused by pathogenic variants in the CFTR gene. The distribution and frequency of CFTR variants vary in different countries and ethnic groups. The spectrum of pathogenic variants of the CFTR gene was previously studied in more than 1,500 CF patients from different regions of the European and North Caucasian region of Russia and the spectrum of the most frequent pathogenic variants of the CFTR gene and ethnic features of their distribution were determined. To assess the population frequency of CFTR gene mutations some of the common variants were analyzed in the samples of healthy unrelated individuals from the populations of the European part of the Russian Federation: 1,324 Russians from four European regions (Pskov, Tver, Rostov, and Kirov regions), representatives of five indigenous ethnic groups of the Volga-Ural region [Mari (n = 505), Udmurts (n = 613), Chuvash (n = 780), Tatars (n = 704), Bashkirs (n = 517)], and six ethnic groups of the North Caucasus [Karachay (n = 324), Nogais (n = 118), Circassians (n = 102), Abazins (n = 128), Ossetians (n = 310), and Chechens (n = 100)]. The frequency of common CFTR mutations was established in studied ethnic groups. The frequency of F508del mutation in Russians was found to be 0.0056 on average, varying between four regions, from 0.0027 in the Pskov region to 0.0069 in the Rostov region. Three variants W1282X, 1677delTA, and F508del were identified in the samples from the North Caucasian populations: in Karachay, the frequency of W1282X mutation was 0.0092, 1677delTA mutation – 0.0032; W1282X mutation in the Nogais sample – 0.0127, the frequency of F508del mutations was 0.0098 and 1677delTA – 0.0098 in Circassians; in Abazins F508del (0.0039), W1282X (0.0039) and 1677delTA (0.0117) mutations were found. In the indigenous peoples of the Volga-Ural region, the maximum frequency of the F508del mutation was detected in the Tatar population (0.099), while this mutation was never detected in the Mari and Bashkir populations. The E92K variant was found in Chuvash and Tatar populations. Thus, interethnic differences in the spectra of CFTR gene variants were shown both in CF patients and in healthy population of the European and North Caucasian part of Russia.
Facebook
TwitterSeveral studies examined the fine-scale structure of human genetic variation in Europe. However, the European sets analyzed represent mainly northern, western, central, and southern Europe. Here, we report an analysis of approximately 166,000 single nucleotide polymorphisms in populations from eastern (northeastern) Europe: four Russian populations from European Russia, and three populations from the northernmost Finno-Ugric ethnicities (Veps and two contrast groups of Komi people). These were compared with several reference European samples, including Finns, Estonians, Latvians, Poles, Czechs, Germans, and Italians. The results obtained demonstrated genetic heterogeneity of populations living in the region studied. Russians from the central part of European Russia (Tver, Murom, and Kursk) exhibited similarities with populations from central–eastern Europe, and were distant from Russian sample from the northern Russia (Mezen district, Archangelsk region). Komi samples, especially Izhemski Komi, were significantly different from all other populations studied. These can be considered as a second pole of genetic diversity in northern Europe (in addition to the pole, occupied by Finns), as they had a distinct ancestry component. Russians from Mezen and the Finnic-speaking Veps were positioned between the two poles, but differed from each other in the proportions of Komi and Finnic ancestries. In general, our data provides a more complete genetic map of Europe accounting for the diversity in its most eastern (northeastern) populations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the median household income across different racial categories in Russia. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to gain insights into economic disparities and trends and explore the variations in median houshold income for diverse racial categories.
Key observations
Based on our analysis of the distribution of Russia population by race & ethnicity, the population is predominantly White. This particular racial category constitutes the majority, accounting for 97.91% of the total residents in Russia. Notably, the median household income for White households is $74,743. Interestingly, White is both the largest group and the one with the highest median household income, which stands at $74,743.
https://i.neilsberg.com/ch/russia-oh-median-household-income-by-race.jpeg" alt="Russia median household income diversity across racial categories">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Russia median household income by race. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within Russia town. The dataset can be utilized to gain insights into gender-based income distribution within the Russia town population, aiding in data analysis and decision-making..
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income brackets:
Variables / Data Columns
Employment type classifications include:
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Russia town median household income by race. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The diversity and abundance of small soil oligochaetes – enchytraeids, were studied in the terrestrial ecosystems of various biomes within European Russia, part of the Northern Palaearctic. Soil samples were collected in the Russian part of the East European Plain, Caucasus region, Novaya Zemlya Archipelago and Franz Josef Land. A total of 204 georeferenced sites were investigated, spanning 5 biomes classified by WWF (Olson et al., 2001): tundra, boreal forest, temperate broadleaf and mixed forest, temperate grassland, savanna and shrubland, and desert and xeric shrubland. This effort resulted in the collection of 73 species belonging to 17 genera of enchytraeids.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Russian General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Russian speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Russian communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Russian speech models that understand and respond to authentic Russian accents and dialects.
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Russian. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
The dataset comes with granular metadata for both speakers and recordings:
Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
This dataset is a versatile resource for multiple Russian speech and language AI applications:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Global Register of Introduced and Invasive Species (GRIIS) presents validated and verified national checklists of introduced (alien) and invasive alien species at the country, territory, and associated island level.
Checklists are living entities, especially for biological invasions given the growing nature of the problem. GRIIS checklists are based on a published methodology and supported by the Integrated Publishing Tool that jointly enable ongoing improvements and updates to expand their taxonomic coverage and completeness.
Phase 1 of the project focused on developing validated and verified checklists of countries that are Party to the Convention on Biological Diversity (CBD). Phase 2 aimed to achieve global coverage including non-party countries and all overseas territories of countries, e.g. those of the Netherlands, France, and the United Kingdom.
All kingdoms of organisms occurring in all environments and systems are covered.
Checklists are reviewed and verified by networks of country or species experts. Verified checklists/ species records, as well as those under review, are presented on the online GRIIS website (www.griis.org) in addition to being published through the GBIF Integrated Publishing Tool.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Russia Bank Lending Conditions: Big Corporations: Loan Diversity data was reported at 1.000 % Point in Mar 2019. This records an increase from the previous number of -1.000 % Point for Dec 2018. Russia Bank Lending Conditions: Big Corporations: Loan Diversity data is updated quarterly, averaging -0.424 % Point from Jun 2009 (Median) to Mar 2019, with 40 observations. The data reached an all-time high of 22.321 % Point in Dec 2014 and a record low of -11.458 % Point in Mar 2010. Russia Bank Lending Conditions: Big Corporations: Loan Diversity data remains active status in CEIC and is reported by The Central Bank of the Russian Federation. The data is categorized under Russia Premium Database’s Monetary and Banking Statistics – Table RU.KAC016: Bank Lending Tightness: Loans to Big Corporations.
Facebook
TwitterThe study of the fauna of fish parasites helps to understand the ways of formation of ichthyofauna and obtain a more complete knowledge of the biodiversity of the aquatic ecosystem as a whole. Parasitologically, the Penzhina River, one of the largest and most inaccessible rivers in the Russian Far East, remained poorly studied for a long time. Penzhina is characterized by an unusually extended mouth area, its estuary is distinguished by extremely high tides, up to 13.0 m, which is the highest tide in Russia. Rich ichthyofauna (21 species of fish and cyclostomes) and a variety of hydrological conditions favor the formation of a diverse fauna of fish parasites in the Penzhina River. The published parasitological data was still fragmentary and concerned few host species, so it is significantly broadened by the authors’ findings and observations. The paper provides information on 122 species of fish parasites in the lower reaches and estuary of the Penzhina River, and Penzhinskaya Bay.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Russian Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Russian language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Russian people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
This fully labeled Russian Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Russian version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Russian Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The database is a collection of information obtained in our own research in 2019-2024, as well as publications from different years with reliable information about geographical location. Geographical coverage - many regions of Russia.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study analyzes the HIV-1 subtype diversity and its phylodynamics in Moscow region, which is the most densely populated area of Russia characterized by high rates of internal and external migration. The demographic and viral data from 896 HIV-infected individuals collected during 2011–2016 were analyzed. The study revealed broad diversity in the HIV-1 subtypes found in Moscow, which included A6 (85.1%), B (7.6%), CRF02_AG (1.2%) and URF_A6/B recombinants (4.2%). Other HIV-1 subtypes were detected as single cases. While A6 was most prevalent (>86.0%) among heterosexuals, injecting drug users and cases of mother-to-child transmission of HIV, subtype B (76.3%) was more common in men who have sex with men. Phylogenetic reconstruction revealed that the A6 sequences were introduced into the epidemic cluster that arose approximately around 1998. Within the subtype B, six major epidemic clusters were identified, each of which contained strains associated with only one or two dominant transmission routes. The date of origin of these clusters varied between 1980 and 1993, indicating that the HIV-1 B epidemic began much earlier than the HIV-1 A6 epidemic. Reconstruction of the demographic history of subtypes A6 and B identified at least two epidemic growth phases, which included an initial phase of exponential growth followed by a decline in the mid/late 2010s. Thus, our results indicate an increase in HIV-1 genetic diversity in Moscow region. They also help in understanding the HIV-1 temporal dynamics as well as the genetic relationships between its circulating strains.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Russian Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Russian speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
The dataset features 30 Hours of dual-channel call center conversations between native Russian speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
This dataset can be used across a range of healthcare and voice AI use cases:
Facebook
TwitterThis digital archive contains interview data with the Veps minority speakers in Russia. The archive consists of audio files (available in .wav format) containing individual and focus group interviews with the speakers of four different age groups. All interview files were named using a special coding system. Each file name includes: a) a country where the research was conducted; b) the speech community studied; c) the form of an interview; d) the type of the target group; e) the age group and gender; f) the date of the interview (DDMMYEAR). For the full list of files, as well as code values please refer to the descriptions under “ELDIAdata: Metadata”.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the digitized treatments in Plazi based on the original journal article Baturina, Maria A., Kaygorodova, Irina A., Loskutova, Olga A. (2020): New data on species diversity of Annelida (Oligochaeta, Hirudinea) in the Kharbey lakes system, Bolshezemelskaya tundra (Russia). ZooKeys 910: 43-78, DOI: http://dx.doi.org/10.3897/zookeys.910.48486, URL: http://dx.doi.org/10.3897/zookeys.910.48486
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
• articles.txt – Texts of popular articles on various topics published on dzen.ru (~20 million characters)
• books-A.txt – Fragments of various works of world-class Russian and foreign literature (~20 million characters)
• books-B.txt – Fragments of various works of literature, both world-famous and little-known (~20 million characters)
• fanfiction.txt – Texts of popular fanfiction on various topics published on ficbook.net (~20 million characters)
• jokes.txt – Texts of various jokes and puns (~6.7 million characters)
• poems.txt – Texts of various poems by world-famous authors (~40 million characters)
Facebook
TwitterSpecific local environmental and sociocultural conditions have led to the creation of various goat populations in Russia. National goat diversity includes breeds that have been selected for down and mohair production traits as well as versatile local breeds for which pastoralism is the main management system. Effective preservation and breeding programs for local goat breeds are missing due to the lack of DNA-based data. In this work, we analyzed the genetic diversity and population structure of Russian local goats, including Altai Mountain, Altai White Downy, Dagestan Downy, Dagestan Local, Karachaev, Orenburg, and Soviet Mohair goats, which were genotyped with the Illumina Goat SNP50 BeadChip. In addition, we addressed genetic relationships between local and global goat populations obtained from the AdaptMap project. Russian goats showed a high level of genetic diversity. Although a decrease in historical effective population sizes was revealed, the recent effective population sizes estimated for three generations ago were larger than 100 in all studied populations. The mean runs of homozygosity (ROH) lengths ranged from 79.42 to 183.94 Mb, and the average ROH number varied from 18 to 41. Short ROH segments (<2 Mb) were predominant in all breeds, while the longest ROH class (>16 Mb) was the least frequent. Principal component analysis, Neighbor-Net graph, and Admixture clustering revealed several patterns in Russian local goats. First, a separation of the Karachaev breed from other populations was observed. Moreover, genetic connections between the Orenburg and Altai Mountain breeds were suggested and the Dagestan breeds were found to be admixed with the Soviet Mohair breed. Neighbor-Net analysis and clustering of local and global breeds demonstrated the close genetic relations between Russian local and Turkish breeds that probably resulted from past admixture events through postdomestication routes. Our findings contribute to the understanding of the genetic relationships of goats originating in West Asia and Eurasia and may be used to design breeding programs for local goats to ensure their effective conservation and proper management.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual diversity score from 2013 to 2023 for Russia Elementary School vs. Ohio and Russia Local School District