100+ datasets found
  1. speech_commands

    • huggingface.co
    • tensorflow.org
    • +1more
    Updated Dec 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2021). speech_commands [Dataset]. https://huggingface.co/datasets/google/speech_commands
    Explore at:
    Dataset updated
    Dec 19, 2021
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a set of one-second .wav audio files, each containing a single spoken English word or background noise. These words are from a small set of commands, and are spoken by a variety of different speakers. This data set is designed to help train simple machine learning models. This dataset is covered in more detail at https://arxiv.org/abs/1804.03209.

    Version 0.01 of the data set (configuration "v0.01") was released on August 3rd 2017 and contains 64,727 audio files.

    In version 0.01 thirty different words were recoded: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go", "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Bed", "Bird", "Cat", "Dog", "Happy", "House", "Marvin", "Sheila", "Tree", "Wow".

    In version 0.02 more words were added: "Backward", "Forward", "Follow", "Learn", "Visual".

    In both versions, ten of them are used as commands by convention: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go". Other words are considered to be auxiliary (in current implementation it is marked by True value of "is_unknown" feature). Their function is to teach a model to distinguish core words from unrecognized ones.

    The _silence_ class contains a set of longer audio clips that are either recordings or a mathematical simulation of noise.

  2. Speech Commands Dataset v0.02

    • kaggle.com
    zip
    Updated Feb 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Dogra (2025). Speech Commands Dataset v0.02 [Dataset]. https://www.kaggle.com/datasets/yashdogra/speech-commands
    Explore at:
    zip(2418503657 bytes)Available download formats
    Dataset updated
    Feb 14, 2025
    Authors
    Yash Dogra
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Google Speech Commands Dataset v0.02 is a curated collection of short (approximately one-second) audio recordings of spoken words, specifically designed for training and benchmarking keyword spotting systems. Each recording captures a single spoken command uttered by a diverse set of speakers, making the dataset highly valuable for developing robust, real-world voice-controlled applications. The commands include common terms such as "yes", "no", "up", "down", "left", "right", "on", "off", "stop", and "go", among others.

    In addition to the primary command recordings, the dataset also provides a set of background noise audio files. These files, stored in a dedicated folder, are intended to support data augmentation techniques and help improve model performance in noisy environments. The dataset has been widely adopted in both academic research and industry applications, serving as a benchmark for lightweight and efficient speech recognition systems.

  3. Google Speech Commands V2

    • kaggle.com
    zip
    Updated Jan 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaladin Stormblessed (2023). Google Speech Commands V2 [Dataset]. https://www.kaggle.com/datasets/sylkaladin/speech-commands-v2
    Explore at:
    zip(2418503657 bytes)Available download formats
    Dataset updated
    Jan 25, 2023
    Authors
    Kaladin Stormblessed
    Description

    Dataset

    This dataset was created by Kaladin Stormblessed

    Contents

  4. h

    google-speech-commands-wav2vec2-960h

    • huggingface.co
    Updated Feb 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hunzla Usman (2024). google-speech-commands-wav2vec2-960h [Dataset]. https://huggingface.co/datasets/Hunzla/google-speech-commands-wav2vec2-960h
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2024
    Authors
    Hunzla Usman
    Description

    Hunzla/google-speech-commands-wav2vec2-960h dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. Google Speech Commands

    • kaggle.com
    zip
    Updated Aug 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neeha Kurelli (2020). Google Speech Commands [Dataset]. https://www.kaggle.com/neehakurelli/google-speech-commands
    Explore at:
    zip(1482297058 bytes)Available download formats
    Dataset updated
    Aug 10, 2020
    Authors
    Neeha Kurelli
    Description

    Dataset

    This dataset was created by Neeha Kurelli

    Contents

  6. google-speech-commands-mfcc

    • kaggle.com
    zip
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olggol (2025). google-speech-commands-mfcc [Dataset]. https://www.kaggle.com/datasets/olggol/google-speech-commands-mfcc
    Explore at:
    zip(6916282126 bytes)Available download formats
    Dataset updated
    Jan 6, 2025
    Authors
    Olggol
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Olggol

    Released under Apache 2.0

    Contents

  7. Z

    Arabic Speech Commands Dataset

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Apr 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulkader Ghandoura (2021). Arabic Speech Commands Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4662480
    Explore at:
    Dataset updated
    Apr 5, 2021
    Dataset provided by
    Syrian Virtual University
    Authors
    Abdulkader Ghandoura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Arabic Speech Commands Dataset

    This dataset is designed to help train simple machine learning models that serve educational and research purposes in the speech recognition domain, mainly for keyword spotting tasks.

    Dataset Description

    Our dataset is a list of pairs (x, y), where x is the input speech signal, and y is the corresponding keyword. The final dataset consists of 12000 such pairs, comprising 40 keywords. Each audio file is one-second in length sampled at 16 kHz. We have 30 participants, each of them recorded 10 utterances for each keyword. Therefore, we have 300 audio files for each keyword in total (30 * 10 * 40 = 12000), and the total size of all the recorded keywords is ~384 MB. The dataset also contains several background noise recordings we obtained from various natural sources of noise. We saved these audio files in a separate folder with the name background_noise and a total size of ~49 MB.

    Dataset Structure

    There are 40 folders, each of which represents one keyword and contains 300 files. The first eight digits of each file name identify the contributor, while the last two digits identify the round number. For example, the file path rotate/00000021_NO_06.wav indicates that the contributor with the ID 00000021 pronounced the keyword rotate for the 6th time.

    Data Split

    We recommend using the provided CSV files in your experiments. We kept 60% of the dataset for training, 20% for validation, and the remaining 20% for testing. In our split method, we guarantee that all recordings of a certain contributor are within the same subset.

    License

    This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. For more details, see the LICENSE file in this folder.

    Citations

    If you want to use the Arabic Speech Commands dataset in your work, please cite it as:

    @article{arabicspeechcommandsv1, author = {Ghandoura, Abdulkader and Hjabo, Farouk and Al Dakkak, Oumayma}, title = {Building and Benchmarking an Arabic Speech Commands Dataset for Small-Footprint Keyword Spotting}, journal = {Engineering Applications of Artificial Intelligence}, year = {2021}, publisher={Elsevier} }

  8. Speech_Command|Application of Speech Recognition

    • kaggle.com
    zip
    Updated Mar 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VK (2022). Speech_Command|Application of Speech Recognition [Dataset]. https://www.kaggle.com/datasets/venkatkumar001/speechcommands
    Explore at:
    zip(820205557 bytes)Available download formats
    Dataset updated
    Mar 28, 2022
    Authors
    VK
    Description

    Google Researcher published the Speech command Dataset! I'm publishing only 14 subcategories of voice data in a 1sec period.

    I'm doing preprocessing and generating JSON files and uploading the dataset! feel free to use it

    Enjoy! And develop your key spotting Application Efficiently

  9. Data from: Written and spoken digits database for multimodal learning

    • zenodo.org
    bin
    Updated Jan 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond (2021). Written and spoken digits database for multimodal learning [Dataset]. http://doi.org/10.5281/zenodo.3515935
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database description:

    The written and spoken digits database is not a new database but a constructed database from existing ones, in order to provide a ready-to-use database for multimodal fusion.

    The written digits database is the original MNIST handwritten digits database [1] with no additional processing. It consists of 70000 images (60000 for training and 10000 for test) of 28 x 28 = 784 dimensions.

    The spoken digits database was extracted from Google Speech Commands [2], an audio dataset of spoken words that was proposed to train and evaluate keyword spotting systems. It consists of 105829 utterances of 35 words, amongst which 38908 utterances of the ten digits (34801 for training and 4107 for test). A pre-processing was done via the extraction of the Mel Frequency Cepstral Coefficients (MFCC) with a framing window size of 50 ms and frame shift size of 25 ms. Since the speech samples are approximately 1 s long, we end up with 39 time slots. For each one, we extract 12 MFCC coefficients with an additional energy coefficient. Thus, we have a final vector of 39 x 13 = 507 dimensions. Standardization and normalization were applied on the MFCC features.

    To construct the multimodal digits dataset, we associated written and spoken digits of the same class respecting the initial partitioning in [1] and [2] for the training and test subsets. Since we have less samples for the spoken digits, we duplicated some random samples to match the number of written digits and have a multimodal digits database of 70000 samples (60000 for training and 10000 for test).

    The dataset is provided in six files as described below. Therefore, if a shuffle is performed on the training or test subsets, it must be performed in unison with the same order for the written digits, spoken digits and labels.

    Files:

    • data_wr_train.npy: 60000 samples of 784-dimentional written digits for training;
    • data_sp_train.npy: 60000 samples of 507-dimentional spoken digits for training;
    • labels_train.npy: 60000 labels for the training subset;
    • data_wr_test.npy: 10000 samples of 784-dimentional written digits for test;
    • data_sp_test.npy: 10000 samples of 507-dimentional spoken digits for test;
    • labels_test.npy: 10000 labels for the test subset.

    References:

    1. LeCun, Y. & Cortes, C. (1998), ā€œMNIST handwritten digit databaseā€.
    2. Warden, P. (2018), ā€œSpeech Commands: A Dataset for Limited-Vocabulary Speech Recognitionā€.
  10. h

    multilingual-speech-commands-15lang

    • huggingface.co
    Updated Jun 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artur Muratov (2025). multilingual-speech-commands-15lang [Dataset]. https://huggingface.co/datasets/artur-muratov/multilingual-speech-commands-15lang
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Artur Muratov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Multilingual Speech Commands Dataset (15 Languages, Augmented)

    This dataset contains augmented speech command samples in 15 languages, derived from multiple public datasets. Only commands that overlap with the Google Speech Commands (GSC) vocabulary are included, making the dataset suitable for multilingual keyword spotting tasks aligned with GSC-style classification. Audio samples have been augmented using standard audio techniques to improve model robustness (e.g., time-shifting… See the full description on the dataset page: https://huggingface.co/datasets/artur-muratov/multilingual-speech-commands-15lang.

  11. mini_speech_commands

    • kaggle.com
    zip
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AntonFilatov (2023). mini_speech_commands [Dataset]. https://www.kaggle.com/datasets/antfilatov/mini-speech-commands
    Explore at:
    zip(178683884 bytes)Available download formats
    Dataset updated
    Oct 5, 2023
    Authors
    AntonFilatov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by AntonFilatov

    Released under Attribution 4.0 International (CC BY 4.0)

    Contents

  12. Spiking Google Speech Commands

    • kaggle.com
    zip
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Shoesmith (2025). Spiking Google Speech Commands [Dataset]. https://www.kaggle.com/datasets/thomasshoesmith/spiking-google-speech-commands
    Explore at:
    zip(110023675 bytes)Available download formats
    Dataset updated
    Nov 11, 2025
    Authors
    Thomas Shoesmith
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Thomas Shoesmith

    Released under Database: Open Database, Contents: Database Contents

    Contents

  13. g

    Google Wake Words and Voice Commands in US English

    • gts.ai
    json
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2023). Google Wake Words and Voice Commands in US English [Dataset]. https://gts.ai/case-study/google-wake-words-and-voice-commands-in-us-english/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Discover Google wake words and voice commands in US English for seamless interaction with your Google enabled devices and services.

  14. Google-synth: A Synthesized Punjabi Speech Dataset

    • figshare.com
    zip
    Updated Jul 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satwinder singh; Ruili Wang; Feng Hou (2023). Google-synth: A Synthesized Punjabi Speech Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.23615607.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 3, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Satwinder singh; Ruili Wang; Feng Hou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Google-synth dataset comprises a synthetic Punjabi dataset that has been generated using Google's Cloud Text-to-Speech service. This dataset encompasses approximately 50,000 synthesized utterances, featuring four synthetic speakers (two male and two female), which amounts to approximately 38 hours of audio data. To facilitate training, validation, and testing, the dataset has been pre-divided into three portions: 80% for training, 10% for validation, and 10% for testing. The dataset is meticulously organized, with all speech files stored in the "clips" directory. The corresponding transcript files (train, dev, and test) are situated in the parent directory and follow the TSV (Tab-Separated Values) format. Each line within the transcript files represents a label assigned to a particular speech sample from the clips directory. The first column of each line contains the path and name of the corresponding WAV file, while the second column, separated by a tab, contains the transcript in textual form.

  15. Another Arabic Voice Command Dataset for Multiple Speech Processing Tasks

    • figshare.com
    application/x-gzip
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed LICHOURI; Khaled Lounnas; Adil Bakri (2023). Another Arabic Voice Command Dataset for Multiple Speech Processing Tasks [Dataset]. http://doi.org/10.6084/m9.figshare.24520546.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Mohamed LICHOURI; Khaled Lounnas; Adil Bakri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The expansion of Internet connectivity has revolutionized our daily lives, with people increasingly relying on smartphones and laptops for various tasks. This technological evolution has prompted the development of innovative solutions to enhance the quality of life for diverse populations, including the elderly and individuals with disabilities. Among the most impactful advancements are voice-command-enabled technologies such as SIRI and Google voice commands, which are built upon the foundation of Speech Recognition modules, a critical component in facilitating human-machine communication.Automatic Speech Recognition (ASR) has witnessed significant progress in achieving human-like performance through data-driven methods. In the context of our research, we have meticulously crafted an Arabic voice command dataset to facilitate advancements in ASR and other speech processing tasks. This dataset comprises 10 distinct commands spoken by 10 unique speakers, each repeated 10 times. Despite its modest size, the dataset has demonstrated remarkable performance across a range of speech processing tasks.The dataset was rigorously evaluated, yielding exceptional results. In ASR, it achieved an accuracy of 95.9%, showcasing its potential for effectively transcribing spoken Arabic commands. Furthermore, the dataset excelled in speaker identification, gender recognition, accent recognition, and spoken language understanding, with macro F1 scores of 99.67%, 100%, 100%, and 97.98%, respectively.This Arabic Voice Command Dataset represents a valuable resource for researchers and developers in the field of speech processing and human-machine interaction. Its quality and diversity make it a robust foundation for developing and testing ASR and other related systems, ultimately contributing to the advancement of voice-command technologies and their widespread accessibility.

  16. Speech Collection of Styles (SPECS)

    • zenodo.org
    zip
    Updated Feb 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2025). Speech Collection of Styles (SPECS) [Dataset]. http://doi.org/10.5281/zenodo.14897750
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dataset of keywords extracted from conversational-style speech and command-style speech and it is made up of 3 subsets: command keywords (ck), extended command keywords (eck) and conversational speech (cs). This dataset is intended solely for research on keyword recognition and speech style analysis.

    For each of the three datasets, we asked users to record themselves in a quiet environment, reciting given text excerpts with specific keywords inside them and saving the recordings as 16kHz 32-bit PCM WAVE files.

    We added an additional folder of background noises, which is a copy from Google's Speech Commands set of background noises.

    -------------------------------------------------------------------------------------------------------------------------------------------------------------

    For command keywords (ck), the users were given 10 text examples, each containing 10 repetitions of the following keywords: on, no, up, off, down, stop, go, right, yes and left. They were asked to pronounce the examples as if answering a device, using an intonation similar to command-style speech; for instance, as if the device asked, "Lights up or lights down?" and they answered "Up!" ten times, with as much variation in intonation between the repetitions as they could produce.

    ck sentences:

    1) On! (10 times)

    2) No! (10 times)

    3) Up! (10 times)

    4) Off! (10 times)

    5) Down! (10 times)

    6) Stop! (10 times)

    7) Go! (10 times)

    8) Right! (10 times)

    9) Yes! (10 times)

    10) Left! (10 times)

    -------------------------------------------------------------------------------------------------------------------------------------------------------------

    For conversational speech (cs), users were given 20 more elaborate text examples such as this one: "She put the book on the table. No other books were there. She then looked up to her mum.". They were asked to recite the examples in their normal speaking tone as if they were telling a story to a friend. The recordings for CS contained a total of 10 instances of the same keywords mentioned for CM.

    cs sentences:

    1) She put the book on the table. No other books were there. She then looked up to her mum.

    2) On Friday they were meeting. He had no idea what she was up to.

    3) She climbed up the ladder with no hesitation and stood on the roof.

    4) The alarm clock rang, but no lights were on to wake him up.

    5) He insisted on going up the mountain, even though no one followed.

    6) They were driving up the street on a sunny day. There was no traffic.

    7) The power went on, but there was no heat, so she bundled up in a blanket.

    8) She started the car up, but no radio was on to play her favourite music.

    9) The computer powered up, but no alerts came on the screen.

    10) No wonder she woke up tired. She had been on call 3 times this week already.

    11) She put off the task, unable to find the right resources. Down the hall, the phone rang.

    12) He switched off the lights, quietly stepped down the stairs and turned right.

    13) The cat jumped right off the neighbour’s fence and darted down our garden alley.

    14) She climbed down slowly, then jumped off right at the bottom.

    15) He pulled off his hat and looked right down at his shoes.

    16) They set off down the road, aiming to reach their destination right at dawn.

    17) The papers flew off the table and floated down to the floor, right next to his shoes.

    18) She carefully wiped the mud off her shoes and sat down, turning slightly, to her right.

    19) They turned right off the busy highway and drove down the country road.

    20) She got off the bus and walked down the street, heading right past the blue house.

    -------------------------------------------------------------------------------------------------------------------------------------------------------------

    For the extended command keywords (eck), we extended the keywords in the ck dataset by additional 100ms to the left and right. A few keywords needed to be manually corrected. Two speakers were removed from the dataset because they pronounced the keywords very quickly, which resulted in too many bad keywords.

  17. h

    voice-commands-google-dataset

    • huggingface.co
    Updated Oct 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adi Alia (2025). voice-commands-google-dataset [Dataset]. https://huggingface.co/datasets/adialia/voice-commands-google-dataset
    Explore at:
    Dataset updated
    Oct 22, 2025
    Authors
    Adi Alia
    Description

    adialia/voice-commands-google-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. Speech_commands_v0.02

    • kaggle.com
    zip
    Updated Jan 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tingting WANG (2019). Speech_commands_v0.02 [Dataset]. https://www.kaggle.com/datasets/mok0na/speech-commands-v002
    Explore at:
    zip(5181233170 bytes)Available download formats
    Dataset updated
    Jan 27, 2019
    Authors
    Tingting WANG
    Description

    Dataset

    This dataset was created by Tingting WANG

    Contents

  19. F

    Russian Wake Words & Voice Commands Speech Data

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Russian Wake Words & Voice Commands Speech Data [Dataset]. https://www.futurebeeai.com/dataset/wake-words-and-commands-dataset/wake-words-and-commands-russian-russia
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Russian Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.

    Speech Data

    This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:

    •Wake words alone
    •Wake words followed by command phrases

    Participant Diversity

    •
    Speakers: 50 native Russian speakers from the FutureBeeAI community
    •
    Regions: Participants from various Russia provinces, ensuring broad coverage of accents and dialects
    •
    Demographics: Ages 18–70; 60% male and 40% female participants

    Recording Details

    •
    Type: Scripted wake words and command phrases
    •
    Duration: 1 to 15 seconds per clip
    •
    Format: WAV, stereo, 16-bit, with sample rates ranging from 16 kHz to 48 kHz

    Dataset Diversity

    •Wake Word Types
    •
    Automobile Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Ok Ford, etc.
    •
    Voice Assistant Wake Words: Hey Siri, Ok Google, Alexa, Hey Cortana, Hi Bixby, Hey Celia, etc.
    •
    Home Appliance Wake Words: Hi LG, Ok LG, Hello Lloyd, and more
    •Command Types by Use Case
    •
    Automobile: Play music, check directions, voice search, provide feedback, and more
    •
    Voice Assistant: Ask general questions, make calls, control devices, shopping, manage calendars, and more
    •
    Home Appliances: Control appliances, check status, set reminders/alarms, manage shopping lists, etc.
    •Recording Environments
    •No background noise
    •Background traffic noise
    •People talking in the background
    •Speaking Pace
    •Normal speed
    •Fast speed

    This diversity ensures robust training for real-world voice assistant applications.

    Metadata

    Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.

    •
    Participant Metadata: Unique ID, age, gender, region, accent, dialect
    •
    Recording Metadata: Transcript, environment, pace, device used, sample rate, bit depth, file format

    Use Cases & Applications

    •
    Voice Assistant Activation: Train models to accurately detect and trigger based on wake words
    •
    Smart Home Devices: Enable responsive voice control in smart appliances
    •
    <b style="font-weight:

  20. V

    Voice Recognition Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Voice Recognition Software Report [Dataset]. https://www.datainsightsmarket.com/reports/voice-recognition-software-537347
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming voice recognition software market! This comprehensive analysis reveals market size, CAGR, key trends (AI, cloud solutions), challenges, and top companies. Explore regional breakdowns and future growth projections (2025-2033) for informed decision-making.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google (2021). speech_commands [Dataset]. https://huggingface.co/datasets/google/speech_commands
Organization logo

speech_commands

SpeechCommands

google/speech_commands

Explore at:
169 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Dec 19, 2021
Dataset authored and provided by
Googlehttp://google.com/
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is a set of one-second .wav audio files, each containing a single spoken English word or background noise. These words are from a small set of commands, and are spoken by a variety of different speakers. This data set is designed to help train simple machine learning models. This dataset is covered in more detail at https://arxiv.org/abs/1804.03209.

Version 0.01 of the data set (configuration "v0.01") was released on August 3rd 2017 and contains 64,727 audio files.

In version 0.01 thirty different words were recoded: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go", "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Bed", "Bird", "Cat", "Dog", "Happy", "House", "Marvin", "Sheila", "Tree", "Wow".

In version 0.02 more words were added: "Backward", "Forward", "Follow", "Learn", "Visual".

In both versions, ten of them are used as commands by convention: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go". Other words are considered to be auxiliary (in current implementation it is marked by True value of "is_unknown" feature). Their function is to teach a model to distinguish core words from unrecognized ones.

The _silence_ class contains a set of longer audio clips that are either recordings or a mathematical simulation of noise.

Search
Clear search
Close search
Google apps
Main menu