2 datasets found
  1. P

    YourMT3 Dataset Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sungkyun Chang; Emmanouil Benetos; Holger Kirchhoff; Simon Dixon (2024). YourMT3 Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/yourmt3-dataset
    Explore at:
    Dataset updated
    Jul 4, 2024
    Authors
    Sungkyun Chang; Emmanouil Benetos; Holger Kirchhoff; Simon Dixon
    Description

    We redistribute a suite of datasets as part of the YourMT3 project. The license for redistribution is attached.

    YourMT3 Dataset Includes:

    Slakh MusicNet (original and EM) MAPS (not used for training) Maestro GuitarSet ENST-drums EGMD MIR-ST500 Restricted Access CMedia Restricted Access RWC-Pop (Bass and Full) Restricted Access URMP IDMT-SMT-Bass

  2. Z

    YourMT3 dataset (Part 1)

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dixon, Simon (2023). YourMT3 dataset (Part 1) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7793341
    Explore at:
    Dataset updated
    Oct 18, 2023
    Dataset provided by
    Sungkyun Chang
    Dixon, Simon
    Benetos, Emmanouil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    UNDER CONSTRUCTION >

    About this version:

    This particular variant of the MusicNet dataset has been resampled to a 16 kHz-mono-16-bit-wav format, which makes it more suitable for certain audio processing tasks, particularly those that require lower sampling rates. We redistribute this data as a part of YourMT3 project. The license for redistribution is attached.

    Moreover, this version of the dataset includes various split options derived from previous works on automatic music transcription as python dictionary (see README.md). Below is a brief description of available split options:

    MUSICNET_SPLIT_INFO = { 'train_mt3': [], # the first 300 songs are synth dataset, while the remaining 300 songs are acoustic dataset. 'train_mt3_synth' : [], # Note: this is not the synthetic dataset of EM (MIDI Pop 80K) nor pitch-augmented. Just recording of MusicNet MIDI, split by MT3 author's split. But not sure if they used this (maybe not). 'train_mt3_acoustic': [], 'validation_mt3': [1733, 1765, 1790, 1818, 2160, 2198, 2289, 2300, 2308, 2315, 2336, 2466, 2477, 2504, 2611], 'validation_mt3_synth': [1733, 1765, 1790, 1818, 2160, 2198, 2289, 2300, 2308, 2315, 2336, 2466, 2477, 2504, 2611], 'validation_mt3_acoustic': [1733, 1765, 1790, 1818, 2160, 2198, 2289, 2300, 2308, 2315, 2336, 2466, 2477, 2504, 2611], 'test_mt3_acoustic': [1729, 1776, 1813, 1893, 2118, 2186, 2296, 2431, 2432, 2487, 2497, 2501, 2507, 2537, 2621], 'train_thickstun': [], # the first 327 songs are synth dataset, while the remaining 327 songs are acoustic dataset.
    'test_thickstun': [1819, 2303, 2382], 'train_mt3_em': [], # 293 tracks. MT3 train set - 7 missing tracks[2194, 2211, 2227, 2230, 2292, 2305, 2310], ours 'validation_mt3_em': [1733, 1765, 1790, 1818, 2160, 2198, 2289, 2300, 2308, 2315, 2336, 2466, 2477, 2504, 2611], # ours 'test_mt3_em': [1729, 1776, 1813, 1893, 2118, 2186, 2296, 2431, 2432, 2487, 2497, 2501, 2507, 2537, 2621], # ours 'train_em_table2' : [], # 317 tracks. Whole set - 7 missing tracks[2194, 2211, 2227, 2230, 2292, 2305, 2310] - 6 test_em 'test_em_table2' : [2191, 2628, 2106, 2298, 1819, 2416], # strings and winds from Cheuk's split, using EM annotations 'test_cheuk_table2' : [2191, 2628, 2106, 2298, 1819, 2416], # strings and winds from Cheuk's split, using Thickstun's annotations }

    About MusicNet:

    The MusicNet dataset, originally released in 2016 by Thickstun et al., "Learning Features of Music from Scratch". It is a collection of music recordings annotated with labels for various tasks, such as automatic music transcription, instrument recognition, and genre classification. The original dataset contains over 330 hours of audio, sourced from various public domain recordings of classical music, and is labeled with instrument activations and note-wise annotations.

    About MusicNet EM:

    MusicNetEM are refined labels for the MusicNet dataset, in the form of MIDI files. They are aligned with the recordings, with onset timing within 32ms. They were created using an EM process, similar to the one described in the Ben Maman and Amit H. Bermano, "Unaligned Supervision for Automatic Music Transcription in The Wild". Their split (Table 2 of this paper) derived from another paper, Kin Wai Cheuk et al., "ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data".

    License:

    CC-BY-4.0

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sungkyun Chang; Emmanouil Benetos; Holger Kirchhoff; Simon Dixon (2024). YourMT3 Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/yourmt3-dataset

YourMT3 Dataset Dataset

Explore at:
Dataset updated
Jul 4, 2024
Authors
Sungkyun Chang; Emmanouil Benetos; Holger Kirchhoff; Simon Dixon
Description

We redistribute a suite of datasets as part of the YourMT3 project. The license for redistribution is attached.

YourMT3 Dataset Includes:

Slakh MusicNet (original and EM) MAPS (not used for training) Maestro GuitarSet ENST-drums EGMD MIR-ST500 Restricted Access CMedia Restricted Access RWC-Pop (Bass and Full) Restricted Access URMP IDMT-SMT-Bass

Search
Clear search
Close search
Google apps
Main menu