taylorbobaylor/google-colab dataset hosted on Hugging Face and contributed by the HF Datasets community
dcrescentiai/test-for-colab dataset hosted on Hugging Face and contributed by the HF Datasets community
Nitin12340/my-colab-upload dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
Poloman/Colab dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GigaSpeech is an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality.
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
itamarcard/colab dataset hosted on Hugging Face and contributed by the HF Datasets community
dwb2023/ragas-golden-dataset-colab dataset hosted on Hugging Face and contributed by the HF Datasets community
Sajjadalgburi/files-colab dataset hosted on Hugging Face and contributed by the HF Datasets community
viksi01/cagliostro-colab-ui dataset hosted on Hugging Face and contributed by the HF Datasets community
jorgeean1777/google-collab dataset hosted on Hugging Face and contributed by the HF Datasets community
Man-snow/evolved-math-problems-from-colab dataset hosted on Hugging Face and contributed by the HF Datasets community
maleehaasghar/sadtalker-colab-assets dataset hosted on Hugging Face and contributed by the HF Datasets community
n8n - Secure Workflow Automation for Technical Teams
n8n is a workflow automation platform that gives technical teams the flexibility of code with the speed of no-code. With 400+ integrations, native AI capabilities, and a fair-code license, n8n lets you build powerful automations while maintaining full control over your data and deployments.
Key Capabilities
Code When You Need It: Write JavaScript/Python, add npm packages, or use the visual interface AI-Nativeโฆ See the full description on the dataset page: https://huggingface.co/datasets/omarelsayeed/n8n-from-colab.
Jokoasa/cagliostro-colab-ui dataset hosted on Hugging Face and contributed by the HF Datasets community
mohamed-illiyas/wav2vec2-base-lj-demo-colab dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Interactions Dataset
We created the interactions dataset from all model interactions recorded in https://github.com/clembench/clembench-runs.git for version v2.0. The dataset is structured as a conversational dataset that contains samples that specify a list of messages. These messages usually iterate on roles, that is, between a user and an assistant, and carry textual content. Furthermore, we added to each sample a meta annotation that informs about game, experiment, task_idโฆ See the full description on the dataset page: https://huggingface.co/datasets/colab-potsdam/playpen-data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Jukebox Embeddings for MusicNet Dataset
Repo with Colab notebook used to extract the embeddings.
Overview
This dataset extends the MusicNet Dataset by providing embeddings for each audio file.
Original MusicNet Dataset
Link to original dataset
Jukebox Embeddings
Embeddings are derived from OpenAI's Jukebox model, following the approach described in Castellon et al. (2021) with some modifications followed in Spotify's Llark paper:
Source: Output ofโฆ See the full description on the dataset page: https://huggingface.co/datasets/jonflynn/musicnet_jukebox_embeddings.
JokoSusiloA/cagliostro-colab-ui dataset hosted on Hugging Face and contributed by the HF Datasets community
kiluade/cagliostro-colab-ui-sktch1 dataset hosted on Hugging Face and contributed by the HF Datasets community
kiluade/cagliostro-colab-ui-gym-track-jacket dataset hosted on Hugging Face and contributed by the HF Datasets community
taylorbobaylor/google-colab dataset hosted on Hugging Face and contributed by the HF Datasets community