6 datasets found

Z
Enhancing MovieLens Dataset: Enriching Recommendations with Audio...
data.niaid.nih.gov
zenodo.org
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Sebastia (2023). Enhancing MovieLens Dataset: Enriching Recommendations with Audio Information, Transcriptions, and Metadata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8037432
Explore at:
Dataset updated
Jun 16, 2023
Dataset provided by
Laura Sebastia
Victor Botti-Cebria
Vanessa Moscardo
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Nowadays, there are lots of datasets available for training and experimentation in the field of recommender systems. Specifically, in the recommendation of audiovisual content, the MovieLens dataset is a prominent example. It is focused on the user-item relationship, providing actual interaction data between users and movies. However, although movies can be described with several characteristics, this dataset only offers limited information about the movie genres.

In this work, we propose enriching the MovieLens dataset by incorporating metadata available on the web (such as cast, description, keywords, etc.) and movie trailers. By leveraging the trailers, we extract audio information and generate transcriptions for each trailer, introducing a crucial textual dimension to the dataset. The audio information was extracted by the waveform and frequency analysis, followed by the application of dimensionality reduction techniques. For the transcription generation, the deep learning model Whisper was used. Finally, metadata was obtained from TMDB, and the BERT model was applied to extract embeddings.

These additional attributes enrich the original dataset, providing deeper and more precise analysis. Then, the use of this extended and enhanced dataset could drive significant advancements in recommendation systems, enhancing user experiences by providing more relevant and tailored movie recommendations based on their tastes and preferences.
h
sayha
huggingface.co
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
de (2025). sayha [Dataset]. https://huggingface.co/datasets/sadece/sayha
Explore at:
Dataset updated
Jan 9, 2025
Authors
de
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Sayha

YouTube Video Audio and Subtitles Extraction

Sayha is a tool designed to download YouTube videos and extract their audio and subtitle data. This can be particularly useful for creating datasets for machine learning projects, transcription services, or language studies.

Features

Download YouTube videos. Extract audio tracks from videos. Retrieve and process subtitle files. Prepare datasets for various applications.

Installation Clone… See the full description on the dataset page: https://huggingface.co/datasets/sadece/sayha.
SLAC Dataset
zenodo.org
zip
Updated Mar 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cory McKay; Cory McKay (2021). SLAC Dataset [Dataset]. http://doi.org/10.5281/zenodo.4571050
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4571050
Dataset updated
Mar 2, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cory McKay; Cory McKay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This distribution includes details of the SLAC multimodal music dataset as well as features extracted from it. This dataset is intended to facilitate research comparing relative musical influences of four different musical modalities: symbolic, lyrical, audio and cultural. SLAC was assembled by independently collecting, for each of its component musical pieces, a symbolic MIDI encoding, a lyrical text transcription, an audio MP3 recording and cultural information mined from the internet. It is important to emphasize the independence of how each of these components were collected; for example, the MIDI and MP3 encodings of each piece were collected entirely separately, and neither was generated from the other.

Features have been extracted from each of the musical pieces in SLAC using the jMIR (http://jmir.sourceforge.net) feature extractor corresponding to each of the modalities: jSymbolic for symbolic, jLyrics for lyrics, jAudio for audio and jWebMiner2 for mining cultural data from search engines and Last.fm (https://www.last.fm).

SLAC is quite small, consisting of only 250 pieces. This is due to the difficulty of finding matching information in all four modalities independently. Although this limited size does pose certain limitations, the dataset is nonetheless the largest (and only) known dataset including all four independently collected modalities.

The dataset is divided into ten genres, with 25 pieces belonging to each genre: Modern Blues, Traditional Blues, Baroque, Romantic, Bop, Swing, Hardcore Rap, Pop Rap, Alternative Rock and Metal. These can be collapsed into a 5-genre taxonomy, with 50 pieces per genre: Blues, Classical, Jazz, Rap and Rock. This facilitates experiments with both coarser and finer classes.

SLAC was published at the ISMIR 2010 conference, and was itself an expansion of the SAC dataset (published at the ISMIR 2008 conference), which is identical except that it excludes the lyrics and lyrical features found in SLAC. Both ISMIR papers are included in this distribution.

Due to copyright limitations, this distribution does not include the actual music or lyrics of the pieces comprising SLAC. It does, however, include details of the contents of the dataset as well as features extracted from each of its modalities using the jMIR software. These include the original features extracted for the 2010 ISMIR paper, as well as an updated set of symbolic features extracted in 2021 using the newer jSymbolic 2.2 feature extractor (published at ISMIR 2018). These jSymbolic 2.2 features include both the full MIDI feature set and a “conservative” feature set meant to limit potential biases due to encoding practice. Feature values are distributed as CSV files, Weka ARFF (https://www.cs.waikato.ac.nz/ml/weka/) files and ACE XML (http://jmir.sourceforge.net) files.
h
TerranAdjutant-1
huggingface.co
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuxuan Zhang (2024). TerranAdjutant-1 [Dataset]. https://huggingface.co/datasets/yxzwayne/TerranAdjutant-1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2024
Authors
Yuxuan Zhang
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains transcripts of all tracks by the Adjutant in Starcraft I Terran Campaigns. It includes both base game and the Brood War. Will 65 entries be good enough to extract assistantship response patterns? I will find out.

Curation Process

Extracted all the sound files from the local Starcraft installation location using a CascLib-based extractor I wrote. This gave me a lot of .ogg files. Starcraft I file nomenclature is nice: for example, all files containing Adjutant… See the full description on the dataset page: https://huggingface.co/datasets/yxzwayne/TerranAdjutant-1.
o
Greek Elementary Sign Language Dataset
explore.openaire.eu
zenodo.org
Updated Sep 8, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sotirios Chatzis; Vassilis Kourbetis (2021). Greek Elementary Sign Language Dataset [Dataset]. http://doi.org/10.5281/zenodo.5810460
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5810460
Dataset updated
Sep 8, 2021
Authors
Sotirios Chatzis; Vassilis Kourbetis
Description
Entails the course material of the first years of elementary school in Greece. It includes 29.698 signed phrases that are present in the 33 issues of 13 distinct textbooks of the A, B and C years of Primary school. The Elementary Dataset consists of the following courses: 9507 videos of Greek Language (1st, 2nd and 3rd year) 6599 videos of Mathematics (1st, 2nd and 3rd year) 4163 videos of Anthology of Greek Literacy (1st, 2nd, 3rd and 4th year) 5528 videos of Environmental Studies (1st, 2nd and 3rd year) 2069 videos of History (3rd year) 1832 Videos of Religious Study (3rd year) Version 2 - Major Features and Improvements Removed Duplicated Videos Improved Transcriptions Audio extraction (for Greek STT tasks)
Data from: Da-TACOS: A Dataset for Cover Song Identification and...
zenodo.org
explore.openaire.eu
zip
Updated Apr 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Furkan Yesiler; Furkan Yesiler; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra (2021). Da-TACOS: A Dataset for Cover Song Identification and Understanding [Dataset]. http://doi.org/10.5281/zenodo.4717628
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4717628
Dataset updated
Apr 27, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Furkan Yesiler; Furkan Yesiler; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
We present Da-TACOS: a dataset for cover song identification and understanding. It contains two subsets, namely the benchmark subset (for benchmarking cover song identification systems) and the cover analysis subset (for analyzing the links among cover songs), with pre-extracted features and metadata for 15,000 and 10,000 songs, respectively. The annotations included in the metadata are obtained with the API of SecondHandSongs.com. All audio files we use to extract features are encoded in MP3 format and their sample rate is 44.1 kHz. Da-TACOS does not contain any audio files. For the results of our analyses on modifiable musical characteristics using the cover analysis subset and our initial benchmarking of 7 state-of-the-art cover song identification algorithms on the benchmark subset, you can look at our publication.

For organizing the data, we use the structure of SecondHandSongs where each song is called a ‘performance’, and each clique (cover group) is called a ‘work’. Based on this, the file names of the songs are their unique performance IDs (PID, e.g. P_22), and their labels with respect to their cliques are their work IDs (WID, e.g. W_14).

Metadata for each song includes

performance title,

performance artist,

work title,

work artist,

release year,

SecondHandSongs.com performance ID,

SecondHandSongs.com work ID,

whether the song is instrumental or not.

In addition, we matched the original metadata with MusicBrainz to obtain MusicBrainz ID (MBID), song length and genre/style tags. We would like to note that MusicBrainz related information is not available for all the songs in Da-TACOS, and since we used just our metadata for matching, we include all possible MBIDs for a particular songs.

For facilitating reproducibility in cover song identification (CSI) research, we propose a framework for feature extraction and benchmarking in our supplementary repository: acoss. The feature extraction component is designed to help CSI researchers to find the most commonly used features for CSI in a single address. The parameter values we used to extract the features in Da-TACOS are shared in the same repository. Moreover, the benchmarking component includes our implementations of 7 state-of-the-art CSI systems. We provide the performance results of an initial benchmarking of those 7 systems on the benchmark subset of Da-TACOS. We encourage other CSI researchers to contribute to acoss with implementing their favorite feature extraction algorithms and their CSI systems to build up a knowledge base where CSI research can reach larger audiences.

The instructions for how to download and use the dataset are shared below. Please contact us if you have any questions or requests.

1. Structure

1.1. Metadata

We provide two metadata files that contain information about the benchmark subset and the cover analysis subset. Both metadata files are stored as python dictionaries in .json format, and have the same hierarchical structure.

An example to load the metadata files in python:

import json with open('./da-tacos_metadata/da-tacos_benchmark_subset_metadata.json') as f: benchmark_metadata = json.load(f)

The python dictionary obtained with the code above will have the respective WIDs as keys. Each key will provide the song dictionaries that contain the metadata regarding the songs that belong to their WIDs. An example can be seen below:

"W_163992": { # work id "P_547131": { # performance id of the first song belonging to the clique 'W_163992' "work_title": "Trade Winds, Trade Winds", "work_artist": "Aki Aleong", "perf_title": "Trade Winds, Trade Winds", "perf_artist": "Aki Aleong", "release_year": "1961", "work_id": "W_163992", "perf_id": "P_547131", "instrumental": "No", "perf_artist_mbid": "9bfa011f-8331-4c9a-b49b-d05bc7916605", "mb_performances": { "4ce274b3-0979-4b39-b8a3-5ae1de388c4a": { "length": "175000" }, "7c10ba3b-6f1d-41ab-8b20-14b2567d384a": { "length": "177653" } } }, "P_547140": { # performance id of the second song belonging to the clique 'W_163992' "work_title": "Trade Winds, Trade Winds", "work_artist": "Aki Aleong", "perf_title": "Trade Winds, Trade Winds", "perf_artist": "Dodie Stevens", "release_year": "1961", "work_id": "W_163992", "perf_id": "P_547140", "instrumental": "No" } }

1.2. Pre-extracted features

The list of features included in Da-TACOS can be seen below. All the features are extracted with acoss repository that uses open-source feature extraction libraries such as Essentia, LibROSA, and Madmom.

To facilitate the use of the dataset, we provide two options regarding the file structure.

1- In da-tacos_benchmark_subset_single_files and da-tacos_coveranalysis_subset_single_files folders, we organize the data based on their respective cliques, and one file contains all the features for that particular song.

{ "chroma_cens": numpy.ndarray, "crema": numpy.ndarray, "hpcp": numpy.ndarray, "key_extractor": { "key": numpy.str_, "scale": numpy.str_,_ "strength": numpy.float64 }, "madmom_features": { "novfn": numpy.ndarray, "onsets": numpy.ndarray, "snovfn": numpy.ndarray, "tempos": numpy.ndarray } "mfcc_htk": numpy.ndarray, "tags": list of (numpy.str_, numpy.str_) "label": numpy.str_, "track_id": numpy.str_ }

2- In da-tacos_benchmark_subset_FEATURE and da-tacos_coveranalysis_subset_FEATURE folders, the data is organized based on their cliques as well, but each of these folders contain only one feature per song. For instance, if you want to test your system that uses HPCP features, you can download da-tacos_benchmark_subset_hpcp to access the pre-computed HPCP features. An example for the contents in those files can be seen below:

{ "hpcp": numpy.ndarray, "label": numpy.str_, "track_id": numpy.str_ }

2. Using the dataset

2.1. Requirements

Python 3.6+

Create virtual environment and install requirements
git clone https://github.com/MTG/da-tacos.git cd da-tacos python3 -m venv venv source venv/bin/activate pip install -r requirements.txt

2.2. Downloading the data

The dataset is currently stored in only in Google Drive (it will be uploaded to Zenodo soon), and can be downloaded from this link. We also provide a python script that automatically downloads the folders you specify. Basic usage of this script can be seen below:

python download_da-tacos.py -h

usage: download_da-tacos.py [-h] [--dataset {benchmark,coveranalysis,da-tacos}] [--type {single_files,cens,crema,hpcp,key,madmom,mfcc,tags}] [--source {gdrive,zenodo}] [--outputdir OUTPUTDIR] [--unpack] [--remove] Download script for Da-TACOS optional arguments: -h, --help show this help message and exit --dataset {metadata,benchmark,coveranalysis,da-tacos} which subset to download. 'da-tacos' option downloads both subsets. the options other than 'metadata' will download the metadata as well. (default: metadata) --type {single_files,cens,crema,hpcp,key,madmom,mfcc,tags} [{single_files,cens,crema,hpcp,key,madmom,mfcc,tags} ...] which folder to download. for downloading multiple folders, you can enter multiple arguments (e.g. '-- type cens crema'). for detailed explanation, please check https://mtg.github.io/da-tacos/ (default: single_files) --source {gdrive,zenodo} from which source to download the files. you can either download from Google Drive (gdrive) or from Zenodo (zenodo) (default: gdrive) --outputdir OUTPUTDIR directory to store the dataset (default: ./)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Laura Sebastia (2023). Enhancing MovieLens Dataset: Enriching Recommendations with Audio Information, Transcriptions, and Metadata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8037432

Enhancing MovieLens Dataset: Enriching Recommendations with Audio Information, Transcriptions, and Metadata

Explore at:

Dataset updated

Jun 16, 2023

Dataset provided by

Laura Sebastia
Victor Botti-Cebria
Vanessa Moscardo

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Nowadays, there are lots of datasets available for training and experimentation in the field of recommender systems. Specifically, in the recommendation of audiovisual content, the MovieLens dataset is a prominent example. It is focused on the user-item relationship, providing actual interaction data between users and movies. However, although movies can be described with several characteristics, this dataset only offers limited information about the movie genres.

In this work, we propose enriching the MovieLens dataset by incorporating metadata available on the web (such as cast, description, keywords, etc.) and movie trailers. By leveraging the trailers, we extract audio information and generate transcriptions for each trailer, introducing a crucial textual dimension to the dataset. The audio information was extracted by the waveform and frequency analysis, followed by the application of dimensionality reduction techniques. For the transcription generation, the deep learning model Whisper was used. Finally, metadata was obtained from TMDB, and the BERT model was applied to extract embeddings.

These additional attributes enrich the original dataset, providing deeper and more precise analysis. Then, the use of this extended and enhanced dataset could drive significant advancements in recommendation systems, enhancing user experiences by providing more relevant and tailored movie recommendations based on their tastes and preferences.

Clear search

Close search

Google apps

Main menu

Enhancing MovieLens Dataset: Enriching Recommendations with Audio...

sayha

SLAC Dataset

TerranAdjutant-1

Greek Elementary Sign Language Dataset

Data from: Da-TACOS: A Dataset for Cover Song Identification and...

Enhancing MovieLens Dataset: Enriching Recommendations with Audio Information, Transcriptions, and Metadata