6 datasets found
  1. Z

    Enhancing MovieLens Dataset: Enriching Recommendations with Audio...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Sebastia (2023). Enhancing MovieLens Dataset: Enriching Recommendations with Audio Information, Transcriptions, and Metadata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8037432
    Explore at:
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Laura Sebastia
    Victor Botti-Cebria
    Vanessa Moscardo
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Nowadays, there are lots of datasets available for training and experimentation in the field of recommender systems. Specifically, in the recommendation of audiovisual content, the MovieLens dataset is a prominent example. It is focused on the user-item relationship, providing actual interaction data between users and movies. However, although movies can be described with several characteristics, this dataset only offers limited information about the movie genres.

    In this work, we propose enriching the MovieLens dataset by incorporating metadata available on the web (such as cast, description, keywords, etc.) and movie trailers. By leveraging the trailers, we extract audio information and generate transcriptions for each trailer, introducing a crucial textual dimension to the dataset. The audio information was extracted by the waveform and frequency analysis, followed by the application of dimensionality reduction techniques. For the transcription generation, the deep learning model Whisper was used. Finally, metadata was obtained from TMDB, and the BERT model was applied to extract embeddings.

    These additional attributes enrich the original dataset, providing deeper and more precise analysis. Then, the use of this extended and enhanced dataset could drive significant advancements in recommendation systems, enhancing user experiences by providing more relevant and tailored movie recommendations based on their tastes and preferences.

  2. h

    sayha

    • huggingface.co
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    de (2025). sayha [Dataset]. https://huggingface.co/datasets/sadece/sayha
    Explore at:
    Dataset updated
    Jan 9, 2025
    Authors
    de
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Sayha

      YouTube Video Audio and Subtitles Extraction
    

    Sayha is a tool designed to download YouTube videos and extract their audio and subtitle data. This can be particularly useful for creating datasets for machine learning projects, transcription services, or language studies.

      Features
    

    Download YouTube videos. Extract audio tracks from videos. Retrieve and process subtitle files. Prepare datasets for various applications.

      Installation
    
    
    
    
    
      Clone… See the full description on the dataset page: https://huggingface.co/datasets/sadece/sayha.
    
  3. SLAC Dataset

    • zenodo.org
    zip
    Updated Mar 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cory McKay; Cory McKay (2021). SLAC Dataset [Dataset]. http://doi.org/10.5281/zenodo.4571050
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 2, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Cory McKay; Cory McKay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This distribution includes details of the SLAC multimodal music dataset as well as features extracted from it. This dataset is intended to facilitate research comparing relative musical influences of four different musical modalities: symbolic, lyrical, audio and cultural. SLAC was assembled by independently collecting, for each of its component musical pieces, a symbolic MIDI encoding, a lyrical text transcription, an audio MP3 recording and cultural information mined from the internet. It is important to emphasize the independence of how each of these components were collected; for example, the MIDI and MP3 encodings of each piece were collected entirely separately, and neither was generated from the other.

    Features have been extracted from each of the musical pieces in SLAC using the jMIR (http://jmir.sourceforge.net) feature extractor corresponding to each of the modalities: jSymbolic for symbolic, jLyrics for lyrics, jAudio for audio and jWebMiner2 for mining cultural data from search engines and Last.fm (https://www.last.fm).

    SLAC is quite small, consisting of only 250 pieces. This is due to the difficulty of finding matching information in all four modalities independently. Although this limited size does pose certain limitations, the dataset is nonetheless the largest (and only) known dataset including all four independently collected modalities.

    The dataset is divided into ten genres, with 25 pieces belonging to each genre: Modern Blues, Traditional Blues, Baroque, Romantic, Bop, Swing, Hardcore Rap, Pop Rap, Alternative Rock and Metal. These can be collapsed into a 5-genre taxonomy, with 50 pieces per genre: Blues, Classical, Jazz, Rap and Rock. This facilitates experiments with both coarser and finer classes.

    SLAC was published at the ISMIR 2010 conference, and was itself an expansion of the SAC dataset (published at the ISMIR 2008 conference), which is identical except that it excludes the lyrics and lyrical features found in SLAC. Both ISMIR papers are included in this distribution.

    Due to copyright limitations, this distribution does not include the actual music or lyrics of the pieces comprising SLAC. It does, however, include details of the contents of the dataset as well as features extracted from each of its modalities using the jMIR software. These include the original features extracted for the 2010 ISMIR paper, as well as an updated set of symbolic features extracted in 2021 using the newer jSymbolic 2.2 feature extractor (published at ISMIR 2018). These jSymbolic 2.2 features include both the full MIDI feature set and a “conservative” feature set meant to limit potential biases due to encoding practice. Feature values are distributed as CSV files, Weka ARFF (https://www.cs.waikato.ac.nz/ml/weka/) files and ACE XML (http://jmir.sourceforge.net) files.

  4. h

    TerranAdjutant-1

    • huggingface.co
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuxuan Zhang (2024). TerranAdjutant-1 [Dataset]. https://huggingface.co/datasets/yxzwayne/TerranAdjutant-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    Yuxuan Zhang
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains transcripts of all tracks by the Adjutant in Starcraft I Terran Campaigns. It includes both base game and the Brood War. Will 65 entries be good enough to extract assistantship response patterns? I will find out.

      Curation Process
    

    Extracted all the sound files from the local Starcraft installation location using a CascLib-based extractor I wrote. This gave me a lot of .ogg files. Starcraft I file nomenclature is nice: for example, all files containing Adjutant… See the full description on the dataset page: https://huggingface.co/datasets/yxzwayne/TerranAdjutant-1.

  5. o

    Greek Elementary Sign Language Dataset

    • explore.openaire.eu
    • zenodo.org
    Updated Sep 8, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sotirios Chatzis; Vassilis Kourbetis (2021). Greek Elementary Sign Language Dataset [Dataset]. http://doi.org/10.5281/zenodo.5810460
    Explore at:
    Dataset updated
    Sep 8, 2021
    Authors
    Sotirios Chatzis; Vassilis Kourbetis
    Description

    Entails the course material of the first years of elementary school in Greece. It includes 29.698 signed phrases that are present in the 33 issues of 13 distinct textbooks of the A, B and C years of Primary school. The Elementary Dataset consists of the following courses: 9507 videos of Greek Language (1st, 2nd and 3rd year) 6599 videos of Mathematics (1st, 2nd and 3rd year) 4163 videos of Anthology of Greek Literacy (1st, 2nd, 3rd and 4th year) 5528 videos of Environmental Studies (1st, 2nd and 3rd year) 2069 videos of History (3rd year) 1832 Videos of Religious Study (3rd year) Version 2 - Major Features and Improvements Removed Duplicated Videos Improved Transcriptions Audio extraction (for Greek STT tasks)

  6. Data from: Da-TACOS: A Dataset for Cover Song Identification and...

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Apr 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Furkan Yesiler; Furkan Yesiler; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra (2021). Da-TACOS: A Dataset for Cover Song Identification and Understanding [Dataset]. http://doi.org/10.5281/zenodo.4717628
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 27, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Furkan Yesiler; Furkan Yesiler; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    We present Da-TACOS: a dataset for cover song identification and understanding. It contains two subsets, namely the benchmark subset (for benchmarking cover song identification systems) and the cover analysis subset (for analyzing the links among cover songs), with pre-extracted features and metadata for 15,000 and 10,000 songs, respectively. The annotations included in the metadata are obtained with the API of SecondHandSongs.com. All audio files we use to extract features are encoded in MP3 format and their sample rate is 44.1 kHz. Da-TACOS does not contain any audio files. For the results of our analyses on modifiable musical characteristics using the cover analysis subset and our initial benchmarking of 7 state-of-the-art cover song identification algorithms on the benchmark subset, you can look at our publication.

    For organizing the data, we use the structure of SecondHandSongs where each song is called a ‘performance’, and each clique (cover group) is called a ‘work’. Based on this, the file names of the songs are their unique performance IDs (PID, e.g. P_22), and their labels with respect to their cliques are their work IDs (WID, e.g. W_14).

    Metadata for each song includes

    • performance title,
    • performance artist,
    • work title,
    • work artist,
    • release year,
    • SecondHandSongs.com performance ID,
    • SecondHandSongs.com work ID,
    • whether the song is instrumental or not.

    In addition, we matched the original metadata with MusicBrainz to obtain MusicBrainz ID (MBID), song length and genre/style tags. We would like to note that MusicBrainz related information is not available for all the songs in Da-TACOS, and since we used just our metadata for matching, we include all possible MBIDs for a particular songs.

    For facilitating reproducibility in cover song identification (CSI) research, we propose a framework for feature extraction and benchmarking in our supplementary repository: acoss. The feature extraction component is designed to help CSI researchers to find the most commonly used features for CSI in a single address. The parameter values we used to extract the features in Da-TACOS are shared in the same repository. Moreover, the benchmarking component includes our implementations of 7 state-of-the-art CSI systems. We provide the performance results of an initial benchmarking of those 7 systems on the benchmark subset of Da-TACOS. We encourage other CSI researchers to contribute to acoss with implementing their favorite feature extraction algorithms and their CSI systems to build up a knowledge base where CSI research can reach larger audiences.

    The instructions for how to download and use the dataset are shared below. Please contact us if you have any questions or requests.

    1. Structure

    1.1. Metadata

    We provide two metadata files that contain information about the benchmark subset and the cover analysis subset. Both metadata files are stored as python dictionaries in .json format, and have the same hierarchical structure.

    An example to load the metadata files in python:

    import json
    
    with open('./da-tacos_metadata/da-tacos_benchmark_subset_metadata.json') as f:
      benchmark_metadata = json.load(f)
    

    The python dictionary obtained with the code above will have the respective WIDs as keys. Each key will provide the song dictionaries that contain the metadata regarding the songs that belong to their WIDs. An example can be seen below:

    "W_163992": { # work id
      "P_547131": { # performance id of the first song belonging to the clique 'W_163992'
        "work_title": "Trade Winds, Trade Winds",
        "work_artist": "Aki Aleong",
        "perf_title": "Trade Winds, Trade Winds",
        "perf_artist": "Aki Aleong",
        "release_year": "1961",
        "work_id": "W_163992",
        "perf_id": "P_547131",
        "instrumental": "No",
        "perf_artist_mbid": "9bfa011f-8331-4c9a-b49b-d05bc7916605",
        "mb_performances": {
          "4ce274b3-0979-4b39-b8a3-5ae1de388c4a": {
            "length": "175000"
          },
          "7c10ba3b-6f1d-41ab-8b20-14b2567d384a": {
            "length": "177653"
          }
        }
      },
      "P_547140": { # performance id of the second song belonging to the clique 'W_163992'
        "work_title": "Trade Winds, Trade Winds",
        "work_artist": "Aki Aleong",
        "perf_title": "Trade Winds, Trade Winds",
        "perf_artist": "Dodie Stevens",
        "release_year": "1961",
        "work_id": "W_163992",
        "perf_id": "P_547140",
        "instrumental": "No"
      }
    }
    

    1.2. Pre-extracted features

    The list of features included in Da-TACOS can be seen below. All the features are extracted with acoss repository that uses open-source feature extraction libraries such as Essentia, LibROSA, and Madmom.

    To facilitate the use of the dataset, we provide two options regarding the file structure.

    1- In da-tacos_benchmark_subset_single_files and da-tacos_coveranalysis_subset_single_files folders, we organize the data based on their respective cliques, and one file contains all the features for that particular song.

    {
      "chroma_cens": numpy.ndarray,
      "crema": numpy.ndarray,
      "hpcp": numpy.ndarray,
      "key_extractor": {
        "key": numpy.str_,
        "scale": numpy.str_,_
        "strength": numpy.float64
      },
      "madmom_features": {
        "novfn": numpy.ndarray, 
        "onsets": numpy.ndarray,
        "snovfn": numpy.ndarray,
        "tempos": numpy.ndarray
      }
      "mfcc_htk": numpy.ndarray,
      "tags": list of (numpy.str_, numpy.str_)
      "label": numpy.str_,
      "track_id": numpy.str_
    }
    
    
    

    2- In da-tacos_benchmark_subset_FEATURE and da-tacos_coveranalysis_subset_FEATURE folders, the data is organized based on their cliques as well, but each of these folders contain only one feature per song. For instance, if you want to test your system that uses HPCP features, you can download da-tacos_benchmark_subset_hpcp to access the pre-computed HPCP features. An example for the contents in those files can be seen below:

    {
      "hpcp": numpy.ndarray,
      "label": numpy.str_,
      "track_id": numpy.str_
    }
    
    

    2. Using the dataset

    2.1. Requirements

    • Python 3.6+
    • Create virtual environment and install requirements
      git clone https://github.com/MTG/da-tacos.git
      cd da-tacos
      python3 -m venv venv
      source venv/bin/activate
      pip install -r requirements.txt
      

    2.2. Downloading the data

    The dataset is currently stored in only in Google Drive (it will be uploaded to Zenodo soon), and can be downloaded from this link. We also provide a python script that automatically downloads the folders you specify. Basic usage of this script can be seen below:

    python download_da-tacos.py -h
    

    usage: download_da-tacos.py [-h]
                  [--dataset {benchmark,coveranalysis,da-tacos}]
    [--type {single_files,cens,crema,hpcp,key,madmom,mfcc,tags}]
    [--source {gdrive,zenodo}]
    [--outputdir OUTPUTDIR] [--unpack] [--remove]

    Download script for Da-TACOS

    optional arguments:
    -h, --help show this help message and exit
    --dataset {metadata,benchmark,coveranalysis,da-tacos}
    which subset to download. 'da-tacos' option downloads both subsets. the options other than 'metadata' will download the metadata as well. (default: metadata)
    --type {single_files,cens,crema,hpcp,key,madmom,mfcc,tags} [{single_files,cens,crema,hpcp,key,madmom,mfcc,tags} ...]
    which folder to download. for downloading multiple folders, you can enter multiple arguments (e.g. '-- type cens crema'). for detailed explanation, please check https://mtg.github.io/da-tacos/ (default: single_files)
    --source {gdrive,zenodo} from which source to download the files. you can either download from Google Drive (gdrive) or from Zenodo (zenodo) (default: gdrive)
    --outputdir OUTPUTDIR
    directory to store the dataset (default: ./)

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Laura Sebastia (2023). Enhancing MovieLens Dataset: Enriching Recommendations with Audio Information, Transcriptions, and Metadata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8037432

Enhancing MovieLens Dataset: Enriching Recommendations with Audio Information, Transcriptions, and Metadata

Explore at:
Dataset updated
Jun 16, 2023
Dataset provided by
Laura Sebastia
Victor Botti-Cebria
Vanessa Moscardo
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Nowadays, there are lots of datasets available for training and experimentation in the field of recommender systems. Specifically, in the recommendation of audiovisual content, the MovieLens dataset is a prominent example. It is focused on the user-item relationship, providing actual interaction data between users and movies. However, although movies can be described with several characteristics, this dataset only offers limited information about the movie genres.

In this work, we propose enriching the MovieLens dataset by incorporating metadata available on the web (such as cast, description, keywords, etc.) and movie trailers. By leveraging the trailers, we extract audio information and generate transcriptions for each trailer, introducing a crucial textual dimension to the dataset. The audio information was extracted by the waveform and frequency analysis, followed by the application of dimensionality reduction techniques. For the transcription generation, the deep learning model Whisper was used. Finally, metadata was obtained from TMDB, and the BERT model was applied to extract embeddings.

These additional attributes enrich the original dataset, providing deeper and more precise analysis. Then, the use of this extended and enhanced dataset could drive significant advancements in recommendation systems, enhancing user experiences by providing more relevant and tailored movie recommendations based on their tastes and preferences.

Search
Clear search
Close search
Google apps
Main menu