Facebook
TwitterThis dataset was created by ehddnr301
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Google Speech Commands Dataset v0.02 is a curated collection of short (approximately one-second) audio recordings of spoken words, specifically designed for training and benchmarking keyword spotting systems. Each recording captures a single spoken command uttered by a diverse set of speakers, making the dataset highly valuable for developing robust, real-world voice-controlled applications. The commands include common terms such as "yes", "no", "up", "down", "left", "right", "on", "off", "stop", and "go", among others.
In addition to the primary command recordings, the dataset also provides a set of background noise audio files. These files, stored in a dedicated folder, are intended to support data augmentation techniques and help improve model performance in noisy environments. The dataset has been widely adopted in both academic research and industry applications, serving as a benchmark for lightweight and efficient speech recognition systems.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Variational Autoencoder (VAE) run on Tensorflow Speech Recognition challenge data.
The code to train the VAE is here: https://www.kaggle.com/holzner/variational-autoencoder-for-speech-dataset .
The VAE was trained on the amplitude part of the spectrograms using a convolutional encoder and a decoder consisting of convolutional and upscaling layers. It was trained only on those classes of the train dataset which should be predicted in the challenge (excluding the background noise samples).
The motivation behind this was to produce features which allow to better distinguish 'unknown' classes which are not present in the train dataset but are in the test dataset.
The dataset contains the following columns:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I don't know why kaggle removed the public test data set in input folder,
anyway, I re-uploaded this file for anyone who may want to test her/his model.
"7z x -oxxxx/" is the command of extracting files to xxxx folder.
Note: xxxx is the output folder name, there's no space between -o and xxxx/
Good Luck.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
One can use these two datasets in various ways. Here are some things I am interested in seeing answered:
A interesting challenge (idea for competition) would be to train on this data set and evaluate on the real dataset.
Here I describe how the synthetic audio samples were created. Code is available at https://github.com/JohannesBuchner/spoken-command-recognition, in the "tensorflow-speech-words" folder.
This work built upon
Please provide appropriate citations to the above when using this work.
To cite the resulting dataset, you can use:
APA-style citation: "Buchner J. Synthetic Speech Commands: A public dataset for single-word speech recognition, 2017. Available from https://www.kaggle.com/jbuchner/synthetic-speech-commands-dataset/".
BibTeX @article{speechcommands, title={Synthetic Speech Commands: A public dataset for single-word speech recognition.}, author={Buchner, Johannes}, journal={Dataset available from https://www.kaggle.com/jbuchner/synthetic-speech-commands-dataset/}, year={2017} }
Thanks to everyone trying to improve open source voice detection and speech recognition.
Facebook
TwitterOriginal audio files were preprocessed as numeric arrays.
The original data can be downloaded here :
http://storage.googleapis.com/download.tensorflow.org/data/mini_speech_commands.zip
Command Names: ['stop' 'up' 'yes' 'left' 'right' 'go' 'down' 'no']
This catalog really impressed me => TensorFlow Datasets
It could be fun to compare with the same content in other languages, don't you think?
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credits: http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz%7D
There are 610 audio files divided as: - 193 for birds - 207 for cats - 210 for dogs
This is a small part of a public dataset for single-word speech recognition, 2017.
Citing:
@article{speechcommands, title={Speech Commands: A public dataset for single-word speech recognition.}, author={Warden, Pete}, journal={Dataset available from http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz}, year={2017} }
Facebook
TwitterSCLAB 제직자 교육을 위해 가져온 교육용 데이터입니다. 출처 : https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/audio/speech_commands.py
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThis dataset was created by ehddnr301