59 datasets found

Russian Speech Recognition Dataset - 338 Hours
kaggle.com
Updated Jun 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata (2025). Russian Speech Recognition Dataset - 338 Hours [Dataset]. https://www.kaggle.com/datasets/unidpro/russian-speech-recognition-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Unidata
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Russian Speech Dataset for recognition task

Dataset comprises 338 hours of telephone dialogues in Russian, collected from 460 native speakers across various topics and domains, with an impressive 98% Word Accuracy Rate. It is designed for research in speech recognition, focusing on various recognition models, primarily aimed at meeting the requirements for automatic speech recognition (ASR) systems.

By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and natural language processing (NLP). - Get the data

💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

Metadata for the dataset

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fa3f375fb273dcad3fe17403bdfccb63b%2Fssssssssss.PNG?generation=1739884059328284&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

The native speakers and various topics and domains covered in the dataset make it an ideal resource for research community, allowing researchers to study spoken languages, dialects, and language patterns.

🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects
u
Russian Speech Recognition Dataset
unidata.pro
wav
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata L.L.C-FZ, Russian Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/russian-speech-recognition-dataset/
Explore at:
wavAvailable download formats
Dataset authored and provided by
Unidata L.L.C-FZ
Description
Unidata provides a Russian Speech Recognition dataset to train AI for seamless speech-to-text conversion
E
Russian Speech Recognition Corpus (Desktop) - 25.85 hours
catalogue.elra.info
live.european-language-grid.eu
Updated Apr 7, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2020). Russian Speech Recognition Corpus (Desktop) - 25.85 hours [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0228_84/
Explore at:
Dataset updated
Apr 7, 2020
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
This corpus comprises 59,968 entries uttered by 50 speakers (25 males and 25 females), recorded over 4 channels (desktop in quiet office). Speech samples are stored as a sequence of 16-bit 44.1kHz for a total of 25.85 hours of speech per channel.
h
Golos
huggingface.co
Updated Sep 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SberDevices (2022). Golos [Dataset]. https://huggingface.co/datasets/SberDevices/Golos
Explore at:
Dataset updated
Sep 5, 2022
Authors
SberDevices
Description
Golos dataset

Golos is a Russian corpus suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours. We have made the corpus freely available for downloading, along with the acoustic model prepared on this corpus. Also we create 3-gram KenLM language model using an open Common Crawl corpus.

Dataset structure

Domain Train files Train hours… See the full description on the dataset page: https://huggingface.co/datasets/SberDevices/Golos.
230 Hours – Russian Speaking English Speech Data by Mobile Phone
nexdata.ai
Updated Oct 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). 230 Hours – Russian Speaking English Speech Data by Mobile Phone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1042
Explore at:
Dataset updated
Oct 31, 2023
Dataset authored and provided by
Nexdata
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition
Description
English(Russia) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and control, in-car command and control, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(498 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
F
Russian Call Center Data for Healthcare AI
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Russian Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-russian-russia
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
This Russian Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Russian speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
Speech Data
The dataset features 30 Hours of dual-channel call center conversations between native Russian speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
•Participant Diversity:
•
Speakers: 60 verified native Russian speakers from our contributor community.

•
Regions: Diverse provinces across Russia to ensure broad dialectal representation.

•
Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.

•RecordingDetails:
•
Conversation Nature: Naturally flowing, unscripted conversations.

•
Call Duration: Each session ranges between 5 to 15 minutes.

•
Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clear conditions without background noise or echo.

Topic Diversity
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
•Inbound Calls:
•Appointment Scheduling
•New Patient Registration
•Surgical Consultation
•Dietary Advice and Consultations
•Insurance Coverage Inquiries
•Follow-up Treatment Requests, and more
•OutboundCalls:
•Appointment Reminders
•Preventive Care Campaigns
•Test Results & Lab Reports
•Health Risk Assessment Calls
•Vaccination Updates
•Wellness Subscription Outreach, and more
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Transcription
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
•Transcription Includes:
•Speaker-identified Dialogues
•Time-coded Segments
•Non-speech Annotations (e.g., silence, cough)
•High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.
Metadata
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
•
Participant Metadata: ID, gender, age, region, accent, and dialect.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

Usage and Applications
This dataset can be used across a range of healthcare and voice AI use cases:
•
<b style="font-weight:
HENSOLDT ANALYTICS Speech-to-text for Russian
live.european-language-grid.eu
Updated Dec 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hensoldt Analytics (2021). HENSOLDT ANALYTICS Speech-to-text for Russian [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/9409
Explore at:
Dataset updated
Dec 20, 2021
Dataset provided by
Hensoldthttp://hensoldt.net/
Authors
Hensoldt Analytics
License
https://gitlab.com/european-language-grid/sail/sail-documents/blob/master/HENSOLDT-ANALYTICS_ELG_LICENSE.mdhttps://gitlab.com/european-language-grid/sail/sail-documents/blob/master/HENSOLDT-ANALYTICS_ELG_LICENSE.md
Description
HENSOLDT ANALYTICS MediaMiningIndexer ASR - automatic speech recognition speech-to-text engine that provides transcription of audio with spoken sentences into text with timestamps and confidence scores, in a variety of languages.
F
Russian General Domain Scripted Monologue Speech Data
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Russian General Domain Scripted Monologue Speech Data [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/general-scripted-speech-monologues-russian-russia
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The Russian Scripted Monologue Speech Dataset for the General Domain is a carefully curated resource designed to support the development of Russian language speech recognition systems. This dataset focuses on general-purpose conversational topics and is ideal for a wide range of AI applications requiring natural, domain-agnostic Russian speech data.
Speech Data
This dataset features over 6,000 high-quality scripted monologue recordings in Russian. The prompts span diverse real-life topics commonly encountered in general conversations and are intended to help train robust and accurate speech-enabled technologies.
•Participant Diversity
•
Speakers: 60 native Russian speakers

•
Regions: Broad regional coverage ensures diverse accents and dialects

•
Demographics: Participants aged 18 to 70, with a 60:40 male-to-female ratio

•Recording Specifications
•
Recording Type: Scripted monologues and prompt-based recordings

•
Audio Duration: 5 to 30 seconds per file

•
Format: WAV, mono channel, 16-bit, 8 kHz & 16 kHz sample rates

•
Environment: Clean, noise-free conditions to ensure clarity and usability

Topic Coverage
The dataset covers a wide variety of general conversation scenarios, including:
•Daily Conversations
•Topic-Specific Discussions
•General Knowledge and Advice
•Idioms and Sayings
Contextual Features
To enhance authenticity, the prompts include:
•
Names: Male and female names specific to different Russia regions

•
Addresses: Commonly used address formats in daily Russian speech

•
Dates & Times: References used in general scheduling and time expressions

•
Organization Names: Names of businesses, institutions, and other entities

•
Numbers & Currencies: Mentions of quantities, prices, and monetary values

Each prompt is designed to reflect everyday use cases, making it suitable for developing generalized NLP and ASR solutions.
Transcription
Every audio file in the dataset is accompanied by a verbatim text transcription, ensuring accurate training and evaluation of speech models.
•
Content: Exact match to the spoken audio

•
Format: Plain text (.TXT), named identically to the corresponding audio file

•
Quality Control: All transcripts are validated by native Russian transcribers

Metadata
Rich metadata is included for detailed filtering and analysis:
•
Speaker Metadata: Unique speaker ID, age, gender, region, and dialect

•
Audio Metadata: Prompt transcript, recording setup, device specs, sample rate, bit depth, and format

Applications & Use Cases
This dataset can power a variety of Russian language AI technologies, including:
•
Speech Recognition Training: ASR model development and fine-tuning
E
Russian Speech Database
catalogue.elra.info
live.european-language-grid.eu
Updated Jun 3, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2005). Russian Speech Database [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0050/
Explore at:
Dataset updated
Jun 3, 2005
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
The STC Russian speech database was recorded in 1996-1998. The main purpose of the database is to investigate individual speaker variability and to validate speaker recognition algorithms. The database was recorded through a 16-bit Vibra-16 Creative Labs sound card with an 11,025 Hz sampling rate.The database contains Russian read speech of 89 different speakers (54 male, 35 female), including 70 speakers with 15 sessions or more, 10 speakers with 10 sessions or more and 9 speakers with less than 10 sessions. The speakers were recorded in Saint-Petersburg and are within the age of 18-62. All are native speakers. The corpus consists of 5 sentences. Each speaker reads carefully but fluently each sentence 15 times on different dates over the period of 1-3 months. The corpus contains a total of 6,889 utterances and of 2 volumes, total size 700 MB uncompressed data. The signal of each utterance is stored as a separate file (approx. 126 KB). Total size of data for one speaker approximates 9,500 KB. Average utterance duration is about 5 sec.A file gives information about the speakers (speaker?s age and gender). The orthography and phonetic transcription of the corpus is given in separate files which contain the prompted sentences and their transcription in IPA. The signal files are raw files without any header, 16 bit per sample, linear, 11,025 Hz sample frequency. The recording conditions were as follows:Microphone: dynamic omnidirectional high-quality microphone, distance to mouth 5-10 cmEnvironment: office roomSampling rate: 11,025 HzResolution: 16 BitSound board: Creative Labs Vibra-16Means of delivery: CD-ROM
a
OPUS Russian Open Speech To Text Dataset v1.01
academictorrents.com
bittorrent
Updated May 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Slizhikova and Alexander Veysov and Dilyara Nurtdinova and Dmitry Voronin (2020). OPUS Russian Open Speech To Text Dataset v1.01 [Dataset]. https://academictorrents.com/details/95b4cab0f99850e119114c8b6df00193ab5fa34f
Explore at:
bittorrent(381530620667)Available download formats
Dataset updated
May 4, 2020
Dataset authored and provided by
Anna Slizhikova and Alexander Veysov and Dilyara Nurtdinova and Dmitry Voronin
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
v1.0-beta Arguably the largest public Russian STT dataset up to date: 15m utterances; 20 000 hours; 2.3 TB (in mono .wav format in int16); For more information please visit
m
Russian Receipts Image Dataset for training AI/ML Models
data.macgence.com
mp3
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2025). Russian Receipts Image Dataset for training AI/ML Models [Dataset]. https://data.macgence.com/dataset/russian-receipts-image-dataset-for-training-aiml-models
Explore at:
mp3Available download formats
Dataset updated
Apr 10, 2025
Dataset authored and provided by
Macgence
License
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Time period covered
2025
Area covered
Worldwide
Variables measured
Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
Description
Improve AI/ML model performance with Macgence's Russian receipt dataset. High-quality, diverse images tailored for precision and advanced analytics!
open_stt_text
kaggle.com
zip
Updated Aug 10, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
lytic (2019). open_stt_text [Dataset]. https://www.kaggle.com/sorokin/open-stt-text
Explore at:
zip(92686433 bytes)Available download formats
Dataset updated
Aug 10, 2019
Authors
lytic
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Context

Russian Open Speech To Text (STT/ASR) Dataset.

Content

Transcriptions from validation and training subsets.

Acknowledgements

https://github.com/snakers4/open_stt
E
Russian Speech Kids Recognition Corpus (Desktop)
catalogue.elra.info
live.european-language-grid.eu
Updated Apr 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2020). Russian Speech Kids Recognition Corpus (Desktop) [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0228_95/
Explore at:
Dataset updated
Apr 7, 2020
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Description
This corpus comprises 19,164 entries uttered by 30 speakers (16 males and 14 females), recorded over 2 channels (desktop in quiet office). Speech samples are stored as a sequence of 16-bit 44.1kHz for a total of 4.15 hours of speech per channel.
F
Russian Call Center Data for Travel AI
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Russian Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-russian-russia
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
This Russian Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Russian -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
Speech Data
The dataset includes 30 hours of dual-channel audio recordings between native Russian speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
•Participant Diversity:
•
Speakers: 60 native Russian contributors from our verified pool.

•
Regions: Covering multiple Russia provinces to capture accent and dialectal variation.

•
Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).

•Recording Details:
•
Conversation Nature: Naturally flowing, spontaneous customer-agent calls.

•
Call Duration: Between 5 and 15 minutes per session.

•
Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.

•
Recording Environment: Captured in controlled, noise-free, echo-free settings.

Topic Diversity
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
•Inbound Calls:
•Booking Assistance
•Destination Information
•Flight Delays or Cancellations
•Support for Disabled Passengers
•Health and Safety Travel Inquiries
•Lost or Delayed Luggage, and more
•Outbound Calls:
•Promotional Travel Offers
•Customer Feedback Surveys
•Booking Confirmations
•Flight Rescheduling Alerts
•Visa Expiry Notifications, and others
These scenarios help models understand and respond to diverse traveler needs in real-time.
Transcription
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-Stamped Segments
•Non-speech Markers (e.g., pauses, coughs)
•High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.
Metadata
Extensive metadata enriches each call and speaker for better filtering and AI training:
•
Participant Metadata: ID, age, gender, region, accent, and dialect.

•
Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

Usage and Applications
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
•
ASR Systems: Train Russian speech-to-text engines for travel platforms.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex;
Z
Russian Open Speech To Text (STT/ASR) Dataset
data.niaid.nih.gov
Updated Jun 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Veysov (2021). Russian Open Speech To Text (STT/ASR) Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4899207
Explore at:
Dataset updated
Jun 4, 2021
Dataset provided by
Alexander Veysov
Dmitry Voronin
Diliara Nurtdinova
Anna Slizhikova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Russian Open Speech To Text (STT/ASR) Dataset

Arguably the largest public Russian STT dataset up to date.
F
Russian Call Center Data for Retail & E-Commerce AI
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Russian Call Center Data for Retail & E-Commerce AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/retail-call-center-conversation-russian-russia
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
This Russian Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Russian speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native Russian speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.
•Participant Diversity:
•
Speakers: 60 native Russian speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across Russia to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.
•Inbound Calls:
•Product Inquiries
•Order Cancellations
•Refund & Exchange Requests
•Subscription Queries, and more
•Outbound Calls:
•Order Confirmations
•Upselling & Promotions
•Account Updates
•Loyalty Program Offers
•Customer Verifications, and others
Such variety enhances your model’s ability to generalize across retail-specific voice interactions.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•30 hours-coded Segments
•Non-speech Tags (e.g., pauses, cough)
•High transcription accuracy with word error rate < 5% due to double-layered quality checks.
These transcriptions are production-ready, making model training faster and more accurate.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender, accent, dialect, and location.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.
Usage and Applications
This dataset is ideal for a range of voice AI and NLP applications:
•
Automatic Speech Recognition (ASR): Fine-tune Russian speech-to-text systems.

<span
F
Russian General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Russian General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-russian-russia
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Russian General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Russian speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Russian communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Russian speech models that understand and respond to authentic Russian accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Russian. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native Russian speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of Russia to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple Russian speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for Russian.

•
Voice Assistants: Build smart assistants capable of understanding natural Russian conversations.

<span
h
audio_tp
huggingface.co
Updated Oct 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vladimir (2024). audio_tp [Dataset]. https://huggingface.co/datasets/firstap/audio_tp
Explore at:
Dataset updated
Oct 25, 2024
Authors
Vladimir
Description
Dusha is a bi-modal corpus suitable for speech emotion recognition (SER) tasks. The dataset consists of audio recordings with Russian speech and their emotional labels. The corpus contains approximately 350 hours of data. Four basic emotions that usually appear in a dialog with a virtual assistant were selected: Happiness (Positive), Sadness, Anger and Neutral emotion.
F
Russian Call Center Data for Telecom AI
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Russian Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-russian-russia
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
This Russian Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Russian-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native Russian speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
•Participant Diversity:
•
Speakers: 60 native Russian speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across Russia to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
•Inbound Calls:
•Phone Number Porting
•Network Connectivity Issues
•Billing and Payments
•Technical Support
•Service Activation
•International Roaming Enquiry
•Refund Requests and Billing Adjustments
•Emergency Service Access, and others
•Outbound Calls:
•Welcome Calls & Onboarding
•Payment Reminders
•Customer Satisfaction Surveys
•Technical Updates
•Service Usage Reviews
•Network Complaint Status Calls, and more
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-coded Segments
•Non-speech Tags (e.g., pauses, coughs)
•High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender, accent, dialect, and location.
h
audio_data_russian_annotated
huggingface.co
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fgfd (2025). audio_data_russian_annotated [Dataset]. https://huggingface.co/datasets/kijjjj/audio_data_russian_annotated
Explore at:
Dataset updated
Jun 21, 2025
Authors
fgfd
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Audio Russian Annotated

This is a dataset with Russian annotated audio data, split into train for tasks like text-to-speech, speech recognition, and speaker identification.

Features

text: Audio transcription (string). speaker_name: Speaker identifier (string). audio: Audio file. utterance_pitch_mean: The average pitch of the speech utterance (float64). utterance_pitch_std: The standard deviation of pitch, representing variability in intonation (float64) snr:… See the full description on the dataset page: https://huggingface.co/datasets/kijjjj/audio_data_russian_annotated.

Facebook

Twitter

Click to copy link

Link copied

Cite

Unidata (2025). Russian Speech Recognition Dataset - 338 Hours [Dataset]. https://www.kaggle.com/datasets/unidpro/russian-speech-recognition-dataset

Russian Speech Recognition Dataset - 338 Hours

Dataset comprises 338 hours of telephone dialogues in Russian

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 30, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Unidata

License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

Russian Speech Dataset for recognition task

Dataset comprises 338 hours of telephone dialogues in Russian, collected from 460 native speakers across various topics and domains, with an impressive 98% Word Accuracy Rate. It is designed for research in speech recognition, focusing on various recognition models, primarily aimed at meeting the requirements for automatic speech recognition (ASR) systems.

By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and natural language processing (NLP). - Get the data

💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

Metadata for the dataset

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fa3f375fb273dcad3fe17403bdfccb63b%2Fssssssssss.PNG?generation=1739884059328284&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

The native speakers and various topics and domains covered in the dataset make it an ideal resource for research community, allowing researchers to study spoken languages, dialects, and language patterns.

🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

Clear search

Close search

Google apps

Main menu

Russian Speech Recognition Dataset - 338 Hours

Russian Speech Dataset for recognition task

💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

Metadata for the dataset

🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

Russian Speech Recognition Dataset

Russian Speech Recognition Corpus (Desktop) - 25.85 hours

Golos

230 Hours – Russian Speaking English Speech Data by Mobile Phone

Russian Call Center Data for Healthcare AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

HENSOLDT ANALYTICS Speech-to-text for Russian

Russian General Domain Scripted Monologue Speech Data

Introduction

Speech Data

Topic Coverage

Contextual Features

Transcription

Metadata

Applications & Use Cases

Russian Speech Database

OPUS Russian Open Speech To Text Dataset v1.01

Russian Receipts Image Dataset for training AI/ML Models

open_stt_text

Context

Content

Acknowledgements

Russian Speech Kids Recognition Corpus (Desktop)

Russian Call Center Data for Travel AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Russian Open Speech To Text (STT/ASR) Dataset

Russian Call Center Data for Retail & E-Commerce AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Russian General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

audio_tp

Russian Call Center Data for Telecom AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

audio_data_russian_annotated

Russian Speech Recognition Dataset - 338 HoursSee More Versions

Dataset comprises 338 hours of telephone dialogues in Russian

Russian Speech Dataset for recognition task

💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

Metadata for the dataset

🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

Russian Speech Recognition Dataset - 338 Hours